Bon Jon
Bon Jon

Reputation: 43

How do I decode utf-8 from a file

I have a program that attempts to encrypt a message using aes. The problem arises when I have to encrypt the message and I get TypeError: Object type <class 'str'> cannot be passed to C code. I found that if I encode it to utf-8 it works, but then when I try to decrypt it it doesn't get rid of the b'...' and the base64 decryption fails, making my iv not 16 bytes. Whenever I try to decode the first line of the file using aes.decrypt(file.readline().decode("utf-8")) it says I can't use decode on a str.

from Crypto.Cipher import AES
from Crypto import Random

def pad(s):
    pad = s + (16 - len(s) % 16) * chr(16 - len(s) % 16)
    return str(pad)

def unpad(s):
    unpad = s[:-ord(s[len(s)-1:])]
    return str(unpad)


class AESCipher:
    def __init__( self, key ):
    self.key = key

    def encrypt( self, s ):
        raw = pad(s)
        iv = Random.new().read( AES.block_size )
        cipher = AES.new( self.key, AES.MODE_CBC, iv )
        return base64.b64encode( iv + cipher.encrypt( raw.encode("utf-8") ) )

    def decrypt( self, enc ):
        enc = base64.b64decode(enc)
        iv = enc[:16]
        cipher = AES.new(self.key, AES.MODE_CBC, iv )
        return unpad(cipher.decrypt( enc[16:] ))

I'm new to encryption so I don't really know if this has been answered before and I just don't know how to word it, but I've been looking around for a few hours and haven't found anything. Thank you. Again, sorry if this isn't worded properly.

Upvotes: 3

Views: 3441

Answers (1)

Tomalak
Tomalak

Reputation: 338316

Your encrypt and decrypt operations are not mirror images of each other.

def encrypt( self, s ):
    iv = Random.new().read( AES.block_size )       # new IV
    cipher = AES.new( self.key, AES.MODE_CBC, iv ) # create cipher
    payload = s.encode("utf-8")                    # string to bytes
    encrypted = cipher.encrypt(pad(payload))       # pad before encrypt
    return base64.b64encode( iv + encrypted )      # b64 data

def decrypt( self, enc ):
    data = base64.b64decode( enc )                 # b64 data
    iv = data[:AES.block_size]                     # split it up
    encrypted = data[AES.block_size:]              # 
    cipher = AES.new(self.key, AES.MODE_CBC, iv )  # recreate cipher
    payload = unpad(cipher.decrypt( encrypted ))   # unpad after decrypt
    return payload.decode("utf8")                  # bytes to string

Only bytes can be encrypted. Strings are not bytes, so encoding strings into a byte representation first is necessary. UTF-8 is a suitable representation, but it could be UTF-16 or even UTF-32 (read about the differences).

However, since the cipher can handle any byte payload, I would remove the part that currently limits these functions to strings. I'd change them to expect and return bytes, and then either:

  • call them as x = aes.encrypt(s.encode('utf8')) and s = aes.decrypt(x).decode('utf8'), respectively, or
  • make wrapper functions for string handling.

For encrypting files you can then directly do this:

with open('some.txt', 'rb') as fp:
    encrypted = aes.encrypt(fp.read())

and this would not impose any encoding assumptions at all, but encrypt the bytes of the file as they are.

AES is a block cipher, which means encrypt(a) + encrypt(b) is the same as encrypt(a + b). For encrypting files that's very useful, because you can read the file incrementally in chunks of N * AES.block_size, with only the last chunk padded. This is a a lot more memory-efficient than reading the whole file into memory first. Your current setup of encrypt and decrypt does not make use of that.

Upvotes: 4

Related Questions