Nemoden
Nemoden

Reputation: 9056

What type of encryption do I need?

Ok, the original task is to track users among 2 "friendly" web-sites who are able to share users cookies (lets say, I have example.com and my friend has mysite.com and also he has a domain simple.example.com so he can set cookies on .example.com).

To track users activity we want to set unique cookie, this cookie should be unique and 32 bytes long (ascii). Quite simple from this point of view and can be implemented as such:

md5(microtime)

and that's it, but now we have new constraints:

  1. we should be able to tell who exactly has set the cookie: exmaple.com engine or mysite.com engine

  2. 32 bytes longs is a must, still

  3. we should be able to encrypt timestamp (when cookies was issued)

  4. first and last character of the resulting cookie value should be different so we can do A/B testing basing on the cookie (so we could always say if last character of the cookie is "> K", show this users "feature A")

Given that the resulting string should always be 32 or less characters long and data should be encrypted and decrypted (not by users, of course) and the string should be unique for the users, it makes the task quite complex.

My thought and questions:

My first take was to pack data into binary string and than base64-encode it. The result would be 8-chars long base64-encoded string:

def encode():
    base64( pack('Lv', timestamp, microseconds) )

Add site-issuer flag and chars at the beginning and the end:

def getCookie():
    rand('a'...'Z') + encode() + issuerFlagChar() + rand('a'...'Z')

So, the result is 11 chars long and we meet constraint 2 easily.

But the problem is: this algorithm is not secure for sure, I'm not sure if the resulting string for millions of websites users is unique.

I wonder if I could use DES or AES for this purpose but I'm not sure that the resulting string will always meet constraint 2 (resulting string should be no longer than 32 ascii chars).

Is there symmetric key algorithms that ensure something like "if you encrypt N bytes with M-bytes key you will have resulting data length of Math.Ceil(N*2+1/M) bytes"? So the resulting length would be predictable?

Upvotes: 2

Views: 551

Answers (2)

Dirbaio
Dirbaio

Reputation: 3140

First of all, you need to know whether you want to encrypt or sign the data.

Encrypting will prevent users from seeing the data, but they are still able to modify it in some ways depending on the encryption type. For example, decrypting a modified ciphertext will simply give corrupted data, it won't fail.

Signing, on the other hand, will prevent users from modifying the data, that is, your code will be able to detect the data has been modified. A simple algorithm for this is HMAC.

I'll assume you want both. My solution below does both.

Your cookie must be 32 bytes long, which is 256 bits. We are going to use 128 bits for encrypted data and 128 bits for the HMAC.

For the data, I will encode the timestamp as a 64bit integer (more than enough even if you want to store it to microsecond precision). The site that issued the cookie can be stored as 1 bit if you have two sites, but I'll store it in a 32bit integer because we have plenty of space. Same for a tag you can use for a/b testing.

All the data is exactly 128 bits, 16 bytes. This is the exact size of an AES block. So, we will encrypt it with AES!

The other 16 bytes will be a MAC of the ciphertext (Encrypt then MAC). I used HMAC-SHA256, which has 256bits of output. We only have room for 128bits, so I have truncated it. In theory this makes it less secure, but in practice 128bit is big enough to make a brute-force attempt impossible.

Decrypting the cookie is the reverse process: calculate the HMAC of the given ciphertext and check it matches the given MAC. If so, proceeed to decrypt the ciphertext and unpack the data.

Here's the code:

from struct import pack, unpack
from Crypto.Cipher import AES
import hashlib
import hmac


AES_KEY = hashlib.sha256(b"secret key 1 asdfasdf").digest()
HMAC_KEY = hashlib.sha256(b"secret key 2 asdfasdf").digest()

# timestamp: 64bit unix timestamp
# site: 32bit integer, which site issued the cookie
# tag: 32bit integer, tag used for a/b testing.
def encrypt_cookie(timestamp, site, tag):

    # Pack the data
    data = pack('QII', timestamp, site, tag)

    # Encrypt it
    aes = AES.new(AES_KEY, AES.MODE_ECB, 'This is an IV456')
    ciphertext = aes.encrypt(data)

    # Do HMAC of the ciphertext
    sig = hmac.new(HMAC_KEY, ciphertext, hashlib.sha256).digest()
    sig = sig[:16]   # Truncate to only first 16 bytes.

    return ciphertext + sig

def decrypt_cookie(cookie):

    # Do HMAC of the ciphertext
    sig = hmac.new(HMAC_KEY, cookie[:16], hashlib.sha256).digest()
    sig = sig[:16]   # Truncate to only first 16 bytes.

    # Check the HMAC is ok
    if sig != cookie[16:]:
        raise Exception("Cookie has been tampered with")

    # Decrypt it
    aes = AES.new(AES_KEY, AES.MODE_ECB, 'This is an IV456')
    data = aes.decrypt(cookie[:16])

    # unPack the data
    timestamp, site, tag = unpack('QII', data)

    return timestamp, site, tag

cookie = encrypt_cookie(1, 2, 3)
print(len(cookie))  # prints: 32
print(decrypt_cookie(cookie))  # prints: 1, 2, 3

# Change a single byte in the cookie, the last one
cookie = cookie[:31] + b'0'
print(decrypt_cookie(cookie))  # raises the exception

I'm curious to know why the cookie must be 32bytes though. Seems a weird requirement, and if you didn't have it, you'd be able to use many libraries that are designed to solve exactly this problem, such as Django signing if you're using Django.

Upvotes: 4

publysher
publysher

Reputation: 11372

Setting aside the fact that you should indeed consult a security consultant, the actual question you pose can easily be answered:

Is there symmetric key algorithms that ensure something like "if you encrypt N bytes with M-bytes key you will have resulting data length of Math.Ceil(N*2+1/M) bytes"? So the resulting length would be predictable?

Yes there are. And they are called Block Ciphers.

By definition, every block cipher has the property that the length of the ciphertext is equal to the length of the plain text. In practice most block ciphers (inclusing DES and AES) cheat a bit because they require the plaintext to be padded to the length of the block before they start encrypting.

In other words, given a plaintext of N bytes and a block size of B, the ciphertext will have a length of B*(Math.ceil(N/B)) bytes.

Note how I am talking about the block size, which is different from the key size. The key size is actually irrelevant in this case.

For example, AES uses a block size of 128 bits, or 16 bytes. This means that if your plain text is between 17 and 32 bytes long, AES will guarantee that your ciphertext is 32 bytes long. This is independent from the key size you choose, which can be one of 128, 192 or 256 bits (16, 24 or 32 bytes).

Upvotes: 7

Related Questions