In 2010, semiconductor manufacturers began migrating the algorithmically intensive portions of the AES cipher on-die in the form of the AES-NI instruction set. Many cryptographic APIs and applications have enabled support for this new technology, and none hesitate to tout the promise of major performance improvements. Intel demonstrates 3x to 10x acceleration versus pure software implementations, while the authors of TrueCrypt set the expectation of 4x to 8x speed gains. Can these performance boosts be recognized in practice, and how much of these gains can be captured in present day, real world scenarios?
Measured “in the vacuum” of the main memory bus, AES-NI certainly delivers on its performance promises. The following benchmarks were recorded with TrueCrypt’s integrated benchmarking facility on a system equipped with an Intel Core i5-2520M Sandy Bridge processor sporting the AES extensions. When enabled on this system, hardware acceleration was observed to result in a more than 5x speed boost in AES encryption and decryption performance, bumping throughput up from 277 MB/s to 1.5 GB/s. Even cascaded modes recognized significant speed gains.
Real world applications, however, do not take place in a vacuum, and most users would be hard pressed to bring a data stream with anywhere close to even the lower of the two measured speeds in scope. Encrypted streams don’t spontaneously originate in main memory with its tens of GB/s of bandwidth. They come from storage devices and network sockets. No rotating mechanical disk drive achieves such high transfer rates at this time (try ~120 MB/s), nor does gigabit ethernet (125 MB/s). Contemporary SATA controllers do in theory (300 MB/s and 600 MB/s), and solid state disks can max out their bandwidth, but do not mix well with encryption due to wear leveling. Some RAID configurations could go there in theory, but doubtfully in practice. 10 gigabit ethernet (1.25 GB/s) could break these speeds, but is limited to exotic applications. In fact, network hardware that can operate at such speeds is largely restricted to industrial contexts.
When performance demand is less than what could be supplied absent hardware acceleration, the acceleration is a “nice to have” and might have incidental benefits, however in throughput terms its performance impact is zero. Hardware-accelerated AES can ultimately only be said to yield a material, tangible speed boost when the AES cipher is operating on a stream of greater bandwidth than a software-only implementation could keep up with at that time. This is not typically the case in practical scenarios with realistic present day hardware.
So, does AES-NI make a difference? Absolutely. It all but eliminates the possibility that the cipher will act as the performance bottleneck in any given application. It reduces the risk of an erroneous or adulterated AES implementation in software, and it mitigates side channel attacks. It conserves CPU cycles, which conserves power. And, there are circumstances when hardware-accelerated AES would have unambiguous, meaningful performance impact:
• When the stream is extremely fast: you’re lucky enough to have an unusually high performance configuration, you’re in a research lab, or you’re in the future with more advanced hardware than exists today
• When the CPU is underpowered with respect to the other bottlenecks in the system: this could be a factor as AES-NI becomes available on lower powered chips, although, every model Intel has shipped with AES on-die appears to be no less than half as fast as that tested here
• When the CPU is substantially taxed at the time by other processes, which is wholly conceivable
But, in spite of all the benefits hardware accelerated AES brings, it would be naive to regard the technical upper speed limits illustrated in benchmarks as real world performance targets. AES-NI is better viewed as future-proofing, which is no doubt what Intel and AMD are up to with their investment in AES technology.