I got ROCm to work on my laptop and guess what that means, benchmarks!
Hardware
- Form factor: Laptop
- Model: Asus G513QY
- CPU: AMD Ryzen 5900HX (83rd percentile on cpu.userbenchmark.com, ~ 10% slower than a desktop 5500)
- GPU: AMD Radeon 6800m (91st percentile on gpu.userbenchmark.com, ~15% slower than a desktop 3060, 1/5th the power of a desktop 4090)
Tensorflow
Model Summary
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 conv2d (Conv2D)             (None, 26, 26, 32)        320
 conv2d_1 (Conv2D)           (None, 24, 24, 64)        18496
 max_pooling2d (MaxPooling2D  (None, 12, 12, 64)       0
 )
 dropout (Dropout)           (None, 12, 12, 64)        0
 flatten (Flatten)           (None, 9216)              0
 dense (Dense)               (None, 128)               1179776
 dense_1 (Dense)             (None, 10)                1290
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
Code can be found here
CPU

Took 654 seconds (10 minutes, 54 seconds).
GPU

Took 63 seconds (1 minute, 3 seconds).
Pytorch
Model Summary
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 32, 26, 26]             320
            Conv2d-2           [-1, 64, 24, 24]          18,496
           Dropout-3           [-1, 64, 12, 12]               0
            Linear-4                  [-1, 128]       1,179,776
           Dropout-5                  [-1, 128]               0
            Linear-6                   [-1, 10]           1,290
               Net-7                   [-1, 10]               0
================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
----------------------------------------------------------------
Code can be found here
CPU

Took 458 seconds (7 minutes 38 seconds).
GPU

Took 110 seconds (1 minute, 50 seconds).
CPU usages compared
Measured during CPU runs.
Tensorflow

Load Average: 11.4
Pytorch

Load Average: 7.05
Inference
- Tensorflow GPU is about 10 times faster than Tensorflow CPU (654 vs 63 seconds).
- Pytorch GPU is about 4 times faster than Pytorch CPU (458 vs 110 seconds).
- Pytorch CPU is about 1.4 times faster than Tensorflow CPU (654 vs 458 seconds).
- Tensorflow GPU is about 1.7 times faster than Pytorch GPU (110 vs 63 seconds)
- Tensorflow is much harder on the CPU than Pytorch (11.4 vs 7.05 load average).
Conclusions / Recommendations
Disclaimer: Everything is based on a few runs on my machine with ROCm. Results could wildly vary for other use cases.
- A GPU can greatly speedup workflows.
- Even cheaper graphics cards help.
- This is due to GPUs’ parallel architecture and being better optimized for lower precision calculations and matrix multiplications.
 
- If there is GPU available, use Tensorflow
- Tensorflow is much faster (1.7 times) than Pytorch with GPU.
 
- If there is no GPU available, use Pytorch
- Tensorflow really pounds the CPU (11.4 vs 7.05 load avg) albeit being slow (654 vs 458 seconds).
 
Let me know if you have suggestions / corrections.
Have a good time ahead!