Joe Huang
Joe Huang

Reputation: 6570

How to encrypt text with crypto without increasing too much size?

I am encrypting text like this (node.js):

var text = "holds a long string..."
var cipher = crypto.createCipher("aes128", "somepassword")
var crypted = cipher.update(text, 'utf8', 'hex')
crypted += cipher.final('hex');

If I save text to a file directly, it is N bytes. If I save crypted, the file size is about N * 2 bytes.

Any way to make the crypted text is N bytes as close as possible?

Upvotes: 3

Views: 6494

Answers (2)

Artjom B.
Artjom B.

Reputation: 61952

A modern cipher like AES works on binary data. When you encrypt character data, it is first transformed into a binary representation. This is basically what UTF-8 encoding does. After encryption, you get arbitrary binary data out, which is not necessarily a valid UTF-8 encoding (almost all encodings have a special structure) when you try to decode it.

If you omit the output_encoding from Cipher#update and Cipher#final, you get a Buffer, which you can concatenate or write to a file. It manages the data in a binary format, but defaults to Hex when printed. When you write the Buffers to a file, the file size will be close to the plaintext size, but it will never reach it.

AES is a block cipher and can only encrypt a single block of exactly 16 bytes. A mode of operation like ECB or CBC enables you to encrypt multiple blocks. Finally, a padding scheme like the default PKCS#7 padding enables you to encrypt texts of arbitrary length. This padding always adds some bytes before the actual decryption. To be precise, it adds from 1 to 16 bytes.

You can use cipher.setAutoPadding(false) to prevent padding, but then you will need to pad yourself. You could also use a streaming mode like CTR ("aes-128-ctr"), but then you need to provide a unique IV (nonce) of 12 bytes for it have any security. This nonce doesn't have to be secret, but you have to transport it to the decrypter.

In the end it is really not possible for the ciphertext to be exactly the same size as the plaintext. There is always something that inflates the ciphertext.


Never use the crypto.createCipher. You need to use a randomized cipher to get semantic security. Use crypto.createCipheriv with a fresh and random IV. For CTR mode, the IV must be unique and for CBC mode, it must be unpredictable.

Always use authenticated encryption. It enables you to detect wrong keys and (malicious) tampering of ciphertexts. Here's an example with AES-GCM.

Upvotes: 4

Scolytus
Scolytus

Reputation: 17157

The problem is your 'hex' encoding. Basically you advise the cipher to

  1. get a binary representation of your string text using the utf8 encoding
  2. encrypt it
  3. transform the binary encoded bytes to a string using hex encoding

Hexadecimal encoding uses 2 bytes to represent 1 actual byte, thus you get a file size approximately twice the size of your plain text.

The solution is to use a more efficient coding for your ciphertext which is still able to hold all possible byte values, which rules out a simple string. Try:

var crypted = cipher.update(text, 'utf8', 'base64');
crypted += cipher.final('base64');

This will encode the ciphertext as a base64 encoded string.

I have created an online example, the results are:

text:                  488890
crypted hex length:    977792, ratio: 2.0000245453987606
crypted base64 length: 651864, ratio: 1.3333551514655648

Security Announcement: Don't use this key/IV generation in production. I would highly advise to use a different IV for each encryption, using crypto.createCipheriv(algorithm, key, iv). But for a demo purpose this is fine.

Upvotes: 5

Related Questions