Reputation: 952
I am encrypting a large (100GB+) file with Python using PyCryptodome using AES-256 in CBC mode.
Rather than read the entire file into memory and encrypt it in one fell swoop, I would like to read the input file a 'chunk' at a time and append to the output file with the results of encrypting each 'chunk.'
Regrettably, the documentation for PyCryptodome is lacking in that I can't find any examples of how to encrypt a long plaintext with multiple calls to encrypt(). All the examples use a short plaintext and encrypt the entire plaintext in a single call to encrypt().
I had assumed that if my input 'chunk' is a multiple of 16 bytes (the block size of AES in CBC mode) I wouldn't need to add padding to any 'chunk' but the last one. However, I wasn't able to get that to work. (I got padding errors while decrypting.)
I'm finding that in order to successfully decrypt the file, I need to add padding to every 'chunk' when encrypting, and decrypt in units of the input chunk size plus 16 bytes. This means the decrypting process needs to know the 'chunk size' used for encryption, which makes me believe that this is probably an incorrect implementation.
While I do have my encryption/decryption working as described, I wonder if this is the 'correct' way to do it. (I suspect it is not.) I've read inconsistent claims on whether or not every such 'chunk' needs padding. If not, I'd like some handholding to get Pycryptodome to encrypt and then decrypt a large plaintext across multiple calls to encrypt() and decrypt().
EDIT: This code throws a ValueError, "Padding is incorrect," when decrpyting the first 'chunk'.
def encrypt_file(infile, outfile, aeskey, iv):
cipher = AES.new(aeskey, AES.MODE_CBC, iv)
with open(infile, "rb") as fin:
with open(outfile, "wb") as fout:
while True:
data = fin.read(16 * 32)
if len(data) ==0:
break
insize = len(data)
if insize == (16 * 32):
padded_data = data
else:
padded_data = pad(data, AES.block_size)
fout.write(cipher.encrypt(padded_data))
def decrypt_file(infile, outfile, aeskey, iv):
cipher = AES.new(aeskey, AES.MODE_CBC, iv)
with open (infile, "rb") as fin:
with open(outfile, "wb") as fout:
while True:
data = fin.read(16 * 32)
if len(data) == 0:
break
fout.write(unpad(cipher.decrypt(data), AES.block_size))
Upvotes: 1
Views: 1008
Reputation: 11
My problem was related to the PAD of the last block. It is necessary to detect which is the last fragment read in bytes in order to add the PAD.
def decrypt_file(
self, filename: str, output_file: str, save_path: str, key, iv
):
cipher_aes = AES.new(key, AES.MODE_CBC, iv)
log.info(f'Decrypting file: {filename} output: {output_file}')
count = 0
previous_data = None
with open(filename, "rb") as f, open(
f"{save_path}/{output_file}", "wb"
) as f2:
while True:
count+=1
data = f.read(self.block_size)
if data == b"":
decrypted = cipher_aes.decrypt(previous_data)
log.info(f'Last block UnPadding Count: {count} BlockSize: {self.block_size}')
decrypted = unpad(decrypted, AES.block_size, style="pkcs7")
f2.write(decrypted)
break
if previous_data:
decrypted = cipher_aes.decrypt(previous_data)
f2.write(decrypted)
previous_data = data
And apply the decrypt:
def decrypt_file(
self, filename: str, output_file: str, save_path: str, key, iv
):
cipher_aes = AES.new(key, AES.MODE_CBC, iv)
log.info(f'Decrypting file: {filename} output: {output_file}')
count = 0
previous_data = None
with open(filename, "rb") as f, open(
f"{save_path}/{output_file}", "wb"
) as f2:
while True:
count+=1
data = f.read(self.block_size)
if data == b"":
decrypted = cipher_aes.decrypt(previous_data)
log.info(f'Last block UnPadding Count: {count} BlockSize: {self.block_size}')
decrypted = unpad(decrypted, AES.block_size, style="pkcs7")
f2.write(decrypted)
break
if previous_data:
decrypted = cipher_aes.decrypt(previous_data)
f2.write(decrypted)
previous_data = data
Upvotes: 1
Reputation: 952
It looks like the fix is to do similar chunksize/padding comparison in the decrypt function as I used in the encrypt function:
def decrypt_file(infile, outfile, aeskey, iv):
cipher = AES.new(aeskey, AES.MODE_CBC, iv)
with open (infile, "rb") as fin:
with open(outfile, "wb") as fout:
while True:
data = fin.read(16 * 32)
if len(data) == 0:
break
if len(data) == (16 * 32):
decrypted_data = cipher.decrypt(data)
else:
decrypted_data = unpad(cipher.decrypt(data), AES.block_size)
fout.write(decrypted_data)
Upvotes: 0