Reputation: 405
I'm implementing a parallel block cipher (Morus, to be precise) in CTR mode and I'd like to make it flexible with respect to the number of threads. It is not difficult per se, as I can partition the message and distribute chunks to any number of threads quite easily, since CTR mode permits an embarrassingly parallel implementation with basically no challenges. However, I'm not sure how to ensure decryption correctness when it happens with a different number of threads with respect to encryption.
Let's say a user decides to encrypt their message with 2 threads (T1
and T2
), and let's say, for simplicity, that my implementation assigns the same (IV, key)
pair to every thread. The message M
is divided in two chunks M1
and M2
and each chunk assigned to a thread for encryption. Let's say that these chunks can be conceived as being divided in two sub-chunks each M1 = [M11,M12]
and M2 = [M21,M22]
(it will be useful later).
When T1
begins encrypting sub-chunk M12
, its cipher's internal state will depend on the previous steps, i.e., initialization with (IV, key)
and encryption of M11
. This state will have an influence on the produced output, there will be "an echo" of it in the encryption of M12
.
Overall, the 2 threads will produce C = Encr(M)
, with C = [C1,C2]
and, again, C1 = [C11,C12]
and C2 = [C21,C22]
, with Cij = Encr(Mij)
. Now, let's say that the same user attempts decrypting C
with 4 threads (T1
, T2
, T3
, and T4
). When T2
tackles its chunk decryption (C12
), it will have a freshly initialized cipher with (IV, key)
. Basically, it will decrypt its chunk in a different state with respect to the one in which it was encrypted.
In this situation, will decryption be successfull? I'm afraid it won't. I'm afraid it will have the same effect as decrypting with the wrong IV
or key
. I think that, in a parallel implementation, either one of these two conditions should hold:
I hope I'm wrong in this. 1) is not a nice solution, as it implies the necessity of a further "contract" between encrypting user and decrypting user (what if encryption is done on a GPU and decryption on an embedded board?). As for 2), it is not good either... in Morus, a state update is needed after each block encryption/decryption, but an initialization requires 16 rounds of state update!
Any thoughts? Is there a way to solve this nicely?
Upvotes: 0
Views: 125