dimba
dimba

Reputation: 27571

Fully utilizing HW accelerator

I would like to use OpenSSL for handling all our SSL communication (both client and server sides). We would like to use HW acceleration card for offloading the heavy cryptographic calculations.

We noticed that in the OpenSSL 'speed' test, there are direct calls to the cryptographic functions (e.g. RSA_sign/decrypt, etc.). In order to fully utilize the HW capacity, multiple threads were needed (up to 128 threads) which load the card with requests and make sure the HW card is never idle.

We would like to use the high level OpenSSL API for handling SSL connections (e.g. SSL_connect/read/write/accept), but this API doesn't expose the point where the actual cryptographic operation is done. For example, when calling SSL_connect, we are not aware of the point where the RSA operations are done, and we don't know in advance which calls will lead to heavy cryptographic calculations and refer only those to the accelerator.

Questions:

  1. How can I use the high level API while still fully utilizing the HW accelerator? Should I use multiple threads?
  2. Is there a 'standard' way of doing this? (implementation example)
  3. (Answered in UPDATE) Are you familiar with Intel's asynchronous OpenSSL ? It seems that they were trying to solve this exact issue, but we cannot find the actual code or usage examples.

UPDATE

  1. From Accelerating OpenSSL* Using Intel® QuickAssist Technology you can see, that Intel also mentions utilization of multiple threads/processes:

    The standard release of OpenSSL is serial in nature, meaning it handles one connection within one context. From the point of view of cryptographic operations, the release is based on a synchronous/ blocking programming model. A major limitation is throughput can be scaled higher only by adding more threads (i.e., processes) to take advantage of core parallelization, but this will also increase context management overhead.

  2. The Intel's OpenSSL branch is finally found here. More info can be found in pdf contained here.

    It looks like Intel changed the way OpenSSL ENGINE works - it posts work to driver and immediately returns, while the corresponding result should be polled.

    If you use other SSL accelerator, than corresponding OpenSSL ENGINE should be modified too.

Upvotes: 9

Views: 1545

Answers (2)

ivan_pozdeev
ivan_pozdeev

Reputation: 35986

According to Interpreting openssl speed output for rsa with multi option , -multi doesn't "parallelize" work or something, it just runs multiple benchmarks in parallel.

So, your HW card's load will be essentially limited by how much work is available at the moment (note that in industry in general, 80% planned capacity load is traditionally considered optimal in case of load spikes). Of course, running multiple server threads/processes will give you the same effect as multiple benchmarks.

OpenSSL supports multiple threads provided that you give it callbacks to lock shared data. For multiple processes, it warns about reusing data state inherited from parent.

That's it for scaling vertically. For scaling horizontally:

  • openssl supports asynchronous I/O through asynchronous BIOs
  • but, its elemental crypto operations and internal ENGINE calls are synchronous, and changing this would require a logic overhaul
  • private efforts to make them provide asynchronous operation have met severe criticism due to major design flaws

Intel announced some "Asynchronous OpenSSL" project (08.2014) to use with its hardware, but the linked white paper gives little details about its implementation and development state. One developer published some related code (10.2015), noting that it's "stable enough to get an overview".

Upvotes: 3

borisp
borisp

Reputation: 180

As jww has mentioned in the comments, you should use the engine API to accomplish the task. There is an example in the above link on how to use that API. Usually, the hardware accelerator provider implements a library that is called an "ENGINE" this engine provides cryptographic acceleration and can be used by OpenSSL internally. Assuming that the accelerator you want to use has an ENGINE implemented(for example "cswitft") you should get the Engine by calling ENGINE *e = ENGINE_by_id("cswift"); and then initialize it ENGINE_init(e); and set it to be the default for the operations you want to use, for example ENGINE_set_default_RSA(e);

After calling these functions, you can use the high level API of OpenSSL (e.g. SSL_connect/read/write/accept)

Upvotes: 1

Related Questions