MNIST Benchmarks (CPU vs GPU / Tensorflow vs Pytorch)

Tue, 08 Aug 2023 00:00:00 +0000

I got ROCm to work on my laptop and guess what that means, benchmarks!

Hardware

Form factor: Laptop
Model: Asus G513QY
CPU: AMD Ryzen 5900HX (83rd percentile on cpu.userbenchmark.com, ~ 10% slower than a desktop 5500)
GPU: AMD Radeon 6800m (91st percentile on gpu.userbenchmark.com, ~15% slower than a desktop 3060, 1/5th the power of a desktop 4090)

Tensorflow

Model Summary

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 conv2d (Conv2D)             (None, 26, 26, 32)        320

 conv2d_1 (Conv2D)           (None, 24, 24, 64)        18496

 max_pooling2d (MaxPooling2D  (None, 12, 12, 64)       0
 )

 dropout (Dropout)           (None, 12, 12, 64)        0

 flatten (Flatten)           (None, 9216)              0

 dense (Dense)               (None, 128)               1179776

 dense_1 (Dense)             (None, 10)                1290

=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________

Code can be found here

CPU

Took 654 seconds (10 minutes, 54 seconds).

GPU

Took 63 seconds (1 minute, 3 seconds).

Pytorch

Model Summary

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 32, 26, 26]             320
            Conv2d-2           [-1, 64, 24, 24]          18,496
           Dropout-3           [-1, 64, 12, 12]               0
            Linear-4                  [-1, 128]       1,179,776
           Dropout-5                  [-1, 128]               0
            Linear-6                   [-1, 10]           1,290
               Net-7                   [-1, 10]               0
================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
----------------------------------------------------------------

Code can be found here

CPU

Took 458 seconds (7 minutes 38 seconds).

GPU

Took 110 seconds (1 minute, 50 seconds).

CPU usages compared

Measured during CPU runs.

Tensorflow

Load Average: 11.4

Pytorch

Load Average: 7.05

Inference

Tensorflow GPU is about 10 times faster than Tensorflow CPU (654 vs 63 seconds).
Pytorch GPU is about 4 times faster than Pytorch CPU (458 vs 110 seconds).
Pytorch CPU is about 1.4 times faster than Tensorflow CPU (654 vs 458 seconds).
Tensorflow GPU is about 1.7 times faster than Pytorch GPU (110 vs 63 seconds)
Tensorflow is much harder on the CPU than Pytorch (11.4 vs 7.05 load average).

Conclusions / Recommendations

Disclaimer: Everything is based on a few runs on my machine with ROCm. Results could wildly vary for other use cases.

A GPU can greatly speedup workflows.
- Even cheaper graphics cards help.
- This is due to GPUs’ parallel architecture and being better optimized for lower precision calculations and matrix multiplications.
If there is GPU available, use Tensorflow
- Tensorflow is much faster (1.7 times) than Pytorch with GPU.
If there is no GPU available, use Pytorch
- Tensorflow really pounds the CPU (11.4 vs 7.05 load avg) albeit being slow (654 vs 458 seconds).

Let me know if you have suggestions / corrections.

Have a good time ahead!

List of Blog Posts on Berin's Webpage

MNIST Benchmarks (CPU vs GPU / Tensorflow vs Pytorch)

Hardware

Tensorflow

Model Summary

CPU

GPU

Pytorch

Model Summary

CPU

GPU

CPU usages compared

Tensorflow

Pytorch

Inference

Conclusions / Recommendations