Parallel block cipher in CTR mode and variable number of threads: how to deal with internal state and permit decryption?

Question

I'm implementing a parallel block cipher (Morus, to be precise) in CTR mode and I'd like to make it flexible with respect to the number of threads. It is not difficult per se, as I can partition the message and distribute chunks to any number of threads quite easily, since CTR mode permits an embarrassingly parallel implementation with basically no challenges. However, I'm not sure how to ensure decryption correctness when it happens with a different number of threads with respect to encryption.

Let's say a user decides to encrypt their message with 2 threads (T1 and T2), and let's say, for simplicity, that my implementation assigns the same (IV, key) pair to every thread. The message M is divided in two chunks M1 and M2 and each chunk assigned to a thread for encryption. Let's say that these chunks can be conceived as being divided in two sub-chunks each M1 = [M11,M12] and M2 = [M21,M22] (it will be useful later).

When T1 begins encrypting sub-chunk M12, its cipher's internal state will depend on the previous steps, i.e., initialization with (IV, key) and encryption of M11. This state will have an influence on the produced output, there will be "an echo" of it in the encryption of M12.

Overall, the 2 threads will produce C = Encr(M), with C = [C1,C2] and, again, C1 = [C11,C12] and C2 = [C21,C22], with Cij = Encr(Mij). Now, let's say that the same user attempts decrypting C with 4 threads (T1, T2, T3, and T4). When T2 tackles its chunk decryption (C12), it will have a freshly initialized cipher with (IV, key). Basically, it will decrypt its chunk in a different state with respect to the one in which it was encrypted.

In this situation, will decryption be successfull? I'm afraid it won't. I'm afraid it will have the same effect as decrypting with the wrong IV or key. I think that, in a parallel implementation, either one of these two conditions should hold:

Decryption should happen with the same number of threads as encryption;
Each thread should conceive its chunk as a sequence of independent blocks, thus re-initializing its cipher at each block's encryption/decrpytion.

I hope I'm wrong in this. 1) is not a nice solution, as it implies the necessity of a further "contract" between encrypting user and decrypting user (what if encryption is done on a GPU and decryption on an embedded board?). As for 2), it is not good either... in Morus, a state update is needed after each block encryption/decryption, but an initialization requires 16 rounds of state update!

Any thoughts? Is there a way to solve this nicely?

Parallel block cipher in CTR mode and variable number of threads: how to deal with internal state and permit decryption?

Answers (0)

Related Questions