Vipin
Vipin

Reputation: 13

Benchmarking using "openssl speed" on ZedBoard

I am trying to benchmark AES and RSA on a ZedBoard using openssl and calculate the time it takes to encrypt or decrypt one block of data.. I am able to get the results using the openssl speed command. However, I am confused with some of the output results and I am hoping if someone who has experience in this area can shed some light.

For instance, in the output of openssl speed -elapsed -evp aes-128-cbc command below:

# openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 3721379 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1035700 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 268675 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 67840 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 8523 aes-128-cbc's in 3.00s
OpenSSL 1.0.2j  26 Sep 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) bl
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      19847.35k    22094.93k    22926.93k    23156.05k    23273.47k

In the last line with the throughput results, I understand that for a 16 bytes block size, running AES-128 in CBC mode for 1 second will process 19.85 MB of data. Is it correct to say that it takes 0.806 μs for encrypting one block of data? ((16/19.85)*10^-6 sec)

Secondly, when testing the RSA speed using the openssl speed command, does it use any dedicated hardware (such as HSM in ZedBoard that is used for signing partitions of a boot image and generate RSA keys)? Or is there a way to check if this hardware module is being used during speed test? Also, the speed output of RSA do not mention anything about the size of the input message for signing. Is it calculated based on the key size, as per the PCKS#1 v2.1 standard? maxInputSize_inBytes <= (keySize_inBits/8)-11

Lastly, is profiling a RSA implementation using perf tool (by counting the number of clock cycles and calculating the execution) same as running openssl speed test?

I am beginner in this area and I apologize if my questions are too naive. Thanks in advance!

Upvotes: 0

Views: 3101

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365277

Reciprocal throughput is not necessarily equal to latency for operations that short. That 3.0 sec / 3721379 = 0.806 us is the average time when the encryption function is hot in I-cache, and branch-predictors are primed. If you call that function only occasionally as part of a larger program, it might be slower.

IDK if any of the ZedBoard models have out-of-order execution CPUs, but even if so presumably the out-of-order window and memory-reordering buffers are fairly limited in size for a low-power device. (Intel Skylake's reorder buffer is 224 uops, but low-power cores might have an out-of-order window of only 20 to 40 instructions.)

Anyway, 0.8 us is probably a reasonable estimate of the impact on surrounding code of calling this function for a 16-byte block, to get the work into the pipeline.

I'm probably making too big a deal out of latency vs. throughput. 800 nanoseconds is 800 clock cycles for a 1GHz CPU, so probably most of the work can't overlap with surrounding code and it really is a cost you can just add up with other costs. But in general on a small scale, that's not how performance works on pipelined CPUs, especially not OoO CPUs.


Sorry I don't know anything specific about your hardware, so I can't comment on the other two parts of the question.

Other answers needed to address the rest of the question, this is only a general answer to the first part.

Upvotes: 2

Related Questions