Whenever I get my hands on a new device or machine, one of the first things I do is run openssl speed -evp <cipher>
and log the throughput for 16k blocks, specifically from AES-256-GCM and ChaCha20-Poly1305 ciphers. These ciphers are widely used for VPNs and TLS, and both ciphers benefit from dedicated instructions and accelerated units present on modern CPUs: AES have dedicated instructions for performing rounds (AES-NI), while ChaCha20 benefits from SIMD with vector operations (AVX2, Neon). High throughput from modern CPUs regardless of clock speed immediately reflect this.
All these efforts started from me trying to find a cheap VM where I could run my self-hosted stuff. There are a lot of cloud providers with Linux VMs using different CPUs at multiple price points. I wanted to do quick tests and compare them to something I could run on my own machines, putting into consideration how much value the cloud VMs give me compared to just hosting them at home. These quick tests turned into a habit: any device I ha, or I can borrow that has OpenSSL would have me running openssl speed
on the terminal.
Things get a bit trickier for non-Linux devices, but the tests can still be done with low effort. For Windows, OpenSSL comes from either an installation of Git or through WSL. In Macs, the OpenSSL which have these ciphers come from Homebrew, and from my experience people who usually install Homebrew used it for installing something that would depend on OpenSSL as well. The weird one is with Android, which can be done by installing Termux APK. Rarely do people have this on their phones, so you have to make sure they’re okay with installing an APK.
I have to emphasize that the results here are imprecise and can only be taken at face value. Unaccounted variables include thermal and power constraints, server tenancy (for cloud VMs), background processes, OpenSSL version and build flags, etc. Putting effort into accounting these would make the testing scheme less opportunistic. My rule of thumb is that I have to be able to do all these on a device I borrowed for under 3 minutes without installing anything new.
So here’s my record of me running OpenSSL speed on as many devices that I can have my hands on:
Device Name | Device Type | Details (OS, etc.) | AES-256-GCM @ 16k | Chacha20-Poly1305 @ 16k |
---|---|---|---|---|
GCP e2-micro (2C1G) asia-east2 | VM - Cloud | Ubuntu, Skylake (w/ AVX512) | 2.997 GB/s | 2.041 GB/s |
GCP e2-micro (2C1G) asia-east1 | VM - Cloud | Ubuntu, Haswell | 2.984 GB/s | 1.415 GB/s |
GCP f1-micro(1C1G) us-west1 | VM - Cloud | Ubuntu, Skylake, free-tier zone | 0.594 GB/s | 0.387 GB/s |
GCP n2-standard-2 (2C8G) asia-southeast1 | VM - Cloud | Ubuntu, Ice Lake | 5.462 GB/s | 3.878 GB/s |
GCP c3-highcpu-4 (4C8G) us-central1-a | VM - Cloud | Ubuntu, Sapphire Rapids (No QAT) | 4.843 GB/s | 3.545 GB/s |
AWS Lightsail (1C2G) asia-southeast-1 | VM - Cloud | Ubuntu, Haswell | 2.005 GB/s | 1.373 GB/s |
AWS t4g-micro (2C1G) asia-east-1 | VM - Cloud | Ubuntu, ARM N1, Graviton2 | 2.014 GB/s | 0.686 GB/s |
AWS c7g-medium (1C2G) us-west-1 | VM - Cloud | Ubuntu, ARM V1, Graviton3 | 3.944 GB/s | 1.093 GB/s |
Ampere Altra (1C4G) | VM - Cloud | Ubuntu, ARM N1, Altra | 2.401 GB/s | 0.874 GB/s |
DigitalOcean (1C1G) sg1 | VM - Cloud | Ubuntu, Cascade Lake (w/ AVX512) | 3.430 GB/s | 2.002 GB/s |
DigitalOcean (1C1G) sg1 | VM - Cloud | Ubuntu, Zen2 (Rome) | 2.804 GB/s | 1.476 GB/s |
Intel Core i7-2600 | VM - Local | Ubuntu on Proxmox, Sandy Bridge | 1.235 GB/s | 0.939 GB/s |
Intel Core i3-7100U | Laptop | Clear Linux, Kaby Lake | 2.694 GB/s | 1.425 GB/s |
Laptop, MacBook Air | Laptop | macOS, ARM M1, OpenSSL from Brew | 5.952 GB/s | 1.835 GB/s |
Laptop, MacBook Pro i7-8750H | Laptop | macOS, Intel, OpenSSL from Brew | 4.391 GB/s | 2.362 GB/s |
Laptop, MacBook Air 2017 i5-5350U | Laptop | macOS, Intel, OpenSSL from Brew | 2.756 GB/s | 1.531 GB/s |
AMD Ryzen 5 2600X | Desktop | Windows 11 - WSL2, Zen+ | 4.324 GB/s | 1.207 GB/s |
AMD Threadripper TR 2950X | Desktop | Windows 11 - Host (from Git), Zen+ | 4.568 GB/s | 1.281 GB/s |
Desktop, i7-8700 | Desktop | Ubuntu, Coffee Lake | 5.121 GB/s | 2.728 GB/s |
Desktop, Mac Mini i5-4278U | Desktop | macOS, Intel (late 2014) | 2.348 GB/s | 1.655 GB/s |
Desktop, iMac i5-2400S | Desktop | macOS, Intel (mid 2011) | 1.111 GB/s | 0.822 GB/s |
MediaTek MT7621 | Router | OpenWRT, Mi Router 4A-G | 0.005 GB/s | 0.019 GB/s |
Broadcom BCM4906 | Router | AsusWRT-Merlin, RT-AC86U | 0.608 GB/s | 0.270 GB/s |
Phone, Exynos 2100 | Phone | Android 12, S21 Ultra | 2.421 GB/s | 1.115 GB/s |
Tablet, Helio A22 | Phone | Android 10, Lenovo Tab M8 | 0.838 GB/s | 0.334 GB/s |
Intel Atom x5-Z8350 | Server | Clear Linux | 0.226 GB/s | 0.188 GB/s |
Intel Core2Duo E6750 | Server | Ubuntu, LGA775-era | 0.067 GB/s | 0.392 GB/s |
Raspberry Pi 3B | SBC/Server | Ubuntu aarch64 | 0.023 GB/s | 0.098 GB/s |
NVIDIA Jetson Nano | SBC/Server | Ubuntu aarch64 A57 | 0.738 GB/s | 0.265 GB/s |
Intel Xeon E5-2620v4 | Server | Ubuntu, Broadwell | 2.427 GB/s | 1.375 GB/s |
Intel Core i7-4790 | Server | Ubuntu, Haswell | 2.985 GB/s | 2.030 GB/s |
AMD EPYC 7282 | Server | Ubuntu, Zen2 (Rome) | 3.462 GB/s | 1.748 GB/s |
Intel Xeon E5405 | Server | Ubuntu, Penryn | 0.079 GB/s | 0.337 GB/s |
Note: I will add more entries and observations as I get more devices which I can run the tests with. I might have to restructure the columns as well. The active document can be found here.
Some observations:
- The CPU generation matters a lot: the increase in throughput for both ciphers shows up as SIMD lanes get wider and instruction sets get more advanced. For cloud providers, you have to make sure that the price point they offer the VMs with are in line with what generation of CPU is used.
- Expect the throughput of server CPUs to be slower than their desktop and laptop counterparts, despite belonging to the same generation. Servers don’t have aggressive clock boosting that desktops or even phones have.
- The Apple M1 had the highest discrepancy between measured throughputs. While undoubtably M1 is the most advanced processor we had in 2020 as shown by having the best AES performance, it had moderate SIMD performance gains shown by ChaCha.
- On the other hand, very old CPUs tend to have better Chacha performance than AES. This is the whole point of having Chacha20-Poly1305 as the cipher for Wireguard: old devices and phones can still get decent VPN throughput despite the lack of dedicated AES units.
With all the information above, what did I end up using to host my stuff online? So far I have GCP’s e2-micro at asia-east-2 winning in terms of value. Before comparing performance results, I had to consider the latency and price: I think 15 USD is too much to pay for anything monthly, and the ping has to be sub-30 ms from home (latency-wise, it’s HK>TW>SG regardless of ISP here). While e2-micro at asia-east-1 also fits the bill, you are more likely to get E2 instances that are Skylake or later in HK. Who knows, this might change as CPU offerings get more modern and pricing become more competitive.