Sniggerfardimungus
Sniggerfardimungus

Reputation: 11782

AES CBC encryption of streams in ruby?

I've been using a fairly standard example (one that is badly broken for my purposes) of cbc encryption in ruby:

def aes(m,k,t)
  (aes = OpenSSL::Cipher::Cipher.new('aes-256-cbc').send(m)).key = Digest::SHA256.digest(k)
  aes.update(t) << aes.final
end

def encrypt(key, text)
  aes(:encrypt, key, text)
end

def decrypt(key, text)
  aes(:decrypt, key, text)
end

This works as an acceptable starting point, but I need to be able to encrypt large streams of data without loading them into one huge chunk of memory. I want to load a meg at a time, update the state of the encryption stream, then move on to the next block. Looking at the docs on the OpenSSL Cipher (which are award-winningly poor) I expect that the call to update should simply continue the stream of data. However, a simple test tells me that there is something very wrong:

Length = 256
newaes = OpenSSL::Cipher::Cipher.new('aes-256-cbc')
newaes.encrypt
newaes.key= Digest::SHA256.digest("foo")
puts Base64.encode64(newaes.update("a"*Length))
puts Base64.encode64(newaes.update("a"*Length))
puts Base64.encode64(newaes.final)

Running this with different values for Length should not give me different streams. However, after the end of the first update, there is always a problem. The streams diverge. I was guessing that the problem was that for some inexplicable reason, the terminating null ('\0') character at the end of the string was being encrypted. After all, each call to update is returning a string that is ((string.length / 16) + 1) * 16 bytes long, implying that it is encrypting an extra byte with each update.

How do I get OpenSSL's encryption and decryption to operate in a mode where I can pass in blocks of data and get the same result back, regardless of the size of the chunks that I break the data into?

EDIT:

The issue is independent of the base64 encode. The following produces 3 different cyphertext results:

require 'digest/sha2'
require 'base64'
require 'openssl'

def base64(data)
    Base64.encode64(data).chomp
end

def crypt_test(blocksize)
    newaes = OpenSSL::Cipher::Cipher.new('aes-256-cbc')
    newaes.encrypt
    newaes.key= Digest::SHA256.digest("foo")
    plaintext = ""
    cyphertext = ""
    File.open("black_bar.jpg") do |fd|
        while not fd.eof
            data = fd.read(blocksize)
            cyphertext += data
            cyphertext += newaes.update(data)
        end
    end
    cyphertext += newaes.final
    puts base64(Digest::SHA256.digest(plaintext))
    puts base64(Digest::SHA256.digest(cyphertext))
    puts
end

crypt_test(1024)
crypt_test(512)
crypt_test(2048)

Upvotes: 4

Views: 3738

Answers (2)

Omnifarious
Omnifarious

Reputation: 56068

Here is your problem:

Length = 256
newaes = OpenSSL::Cipher::Cipher.new('aes-256-cbc')
newaes.encrypt
newaes.key= Digest::SHA256.digest("foo")
s1 = newaes.update("a"*Length)
s2 = newaes.update("a"*Length)
s3 = newaes.final
puts Base64.encode64(s1 + s2 + s3)

This will now output the exact same base64 as if you squish the two updates into one.

You are running into an 'alignment' issue with base64 encoding. Base64 encoding takes 3 bytes at a time and transforms them into 4 bytes. If you give it a number of bytes that is not a multiple of 3, it pads things with '=' signs.

This means if you have two successive encoding runs that are not a multiple of 3 bytes long, and then encode the exact same sequence of bytes in just one encoding run, you will get different base64 output. The second encoding run is not 'aligned' the same way as it would be if the data were part of the first encoding run. Here are some examples:

Here, the data is a multiple of 3. The two runs of the encoder produce base64 sequences that can be concatenate together to produce more or less the same sequence as one run of the encoder over the concatenated strings.

> Base64.encode64('abc')
=> "YWJj\n"
> Base64.encode64('def')
=> "ZGVm\n"
> Base64.encode64('abcdef')
=> "YWJjZGVm\n"

Here the data is split up into sequences of 4 bytes, and 4 is not a multiple of 3. The concatenation of the two runs of the encoder is not the same as the encoding of the two strings concatenated.

> Base64.encode64('abcd')
=> "YWJjZA==\n"
> Base64.encode64('efgh')
=> "ZWZnaA==\n"
> Base64.encode64('abcdefgh')
=> "YWJjZGVmZ2g=\n"

Upvotes: 2

Thomas Pornin
Thomas Pornin

Reputation: 74482

I have about zero knowledge on Ruby. However, your problem looks like a padding issue.

AES/CBC encrypts data by blocks of 16 bytes, no less. Padding is about adding a few bytes such that:

  1. the padded length is a multiple of 16;
  2. upon decryption, the extra bytes can unambiguously removed.

The second condition means that there cannot be a "zero-length padding" (at least, not without resorting to dark trickery such as "ciphertext stealing"). There must be at least one extra byte of padding. Otherwise, the decryptor would not know whether the end of the obtained data is really some padding, or the actual message which happens to end in some bytes which "look like" padding.

A very common padding scheme is the one specified in PKCS#5 (see section 6.1.1): for blocks of length n (n=16 for AES), at least 1 and at most n bytes are added; if k bytes are added, then they all have numerical value k. Upon decryption, one just needs to look at the numerical value of the last byte to know how many padding bytes were added. The PKCS#5 padding scheme implies the behaviour that you observe: encryption of m bytes produces n*((m/n)+1) output bytes.

If your calls indeed add a PKCS#5 padding at each update, then you can recover from that by removing the last 16 bytes of what they return. You will also have to reset the IV for the next update call, so that what the next update call returns can be simply appended. Speaking of which, I see nothing in your code about the IV, and that's fishy. CBC mode requires a new random IV (selected with a "strong enough" generator) for each message; the IV must then be transmitted along with the encrypted message (whoever decrypts the data will need it; the IV can be sent "in the clear").

The paragraph above should be clearer if you know how CBC works. Wikipedia has good schematics on that.

Upvotes: 3

Related Questions