Oved D
Oved D

Reputation: 7442

How can I create an encrypted django field that converts data when it's retrieved from the database?

I have a custom EncryptedCharField, which I want to basically appear as a CharField when interfacing UI, but before storing/retrieving in the DB it encrypts/decrypts it.

The custom fields documentation says to:

  1. add __metaclass__ = models.SubfieldBase
  2. override to_python to convert the data from it's raw storage into the desired format
  3. override get_prep_value to convert the value before storing ot the db.

So you think this would be easy enough - for 2. just decrypt the value, and 3. just encrypt it.

Based loosely on a django snippet, and the documentation this field looks like:

class EncryptedCharField(models.CharField):
  """Just like a char field, but encrypts the value before it enters the database, and    decrypts it when it
  retrieves it"""
  __metaclass__ = models.SubfieldBase
  def __init__(self, *args, **kwargs):
    super(EncryptedCharField, self).__init__(*args, **kwargs)
    cipher_type = kwargs.pop('cipher', 'AES')
    self.encryptor = Encryptor(cipher_type)

  def get_prep_value(self, value):
     return encrypt_if_not_encrypted(value, self.encryptor)

  def to_python(self, value):
    return decrypt_if_not_decrypted(value, self.encryptor)


def encrypt_if_not_encrypted(value, encryptor):
  if isinstance(value, EncryptedString):
    return value
  else:
    encrypted = encryptor.encrypt(value)
    return EncryptedString(encrypted)

def decrypt_if_not_decrypted(value, encryptor):
  if isinstance(value, DecryptedString):
    return value
  else:
    encrypted = encryptor.decrypt(value)
    return DecryptedString(encrypted)


class EncryptedString(str):
  pass

class DecryptedString(str):
  pass

and the Encryptor looks like:

class Encryptor(object):
  def __init__(self, cipher_type):
    imp = __import__('Crypto.Cipher', globals(), locals(), [cipher_type], -1)
    self.cipher = getattr(imp, cipher_type).new(settings.SECRET_KEY[:32])

  def decrypt(self, value):
    #values should always be encrypted no matter what!
    #raise an error if tthings may have been tampered with
    return self.cipher.decrypt(binascii.a2b_hex(str(value))).split('\0')[0]

  def encrypt(self, value):
    if value is not None and not isinstance(value, EncryptedString):
      padding  = self.cipher.block_size - len(value) % self.cipher.block_size
      if padding and padding < self.cipher.block_size:
        value += "\0" + ''.join([random.choice(string.printable) for index in range(padding-1)])
      value = EncryptedString(binascii.b2a_hex(self.cipher.encrypt(value)))
    return value

When saving a model, an error, Odd-length string, occurs, as a result of attempting to decrypt an already decrypted string. When debugging, it appears as to_python ends up being called twice, the first with the encrypted value, and the second time with the decrypted value, but not actually as a type Decrypted, but as a raw string, causing the error. Furthermore get_prep_value is never called.

What am I doing wrong?

This should not be that hard - does anyone else think this Django field code is very poorly written, especially when it comes to custom fields, and not that extensible? Simple overridable pre_save and post_fetch methods would easily solve this problem.

Upvotes: 18

Views: 24739

Answers (5)

maulik13
maulik13

Reputation: 3760

I think the issue is that to_python is also called when you assign a value to your custom field (as part of validation may be, based on this link). So the problem is to distinguish between to_python calls in the following situations:

  1. When a value from the database is assigned to the field by Django (That's when you want to decrypt the value)
  2. When you manually assign a value to the custom field, e.g. record.field = value

One hack you could use is to add prefix or suffix to the value string and check for that instead of doing isinstance check.

I was going to write an example, but I found this one (even better :)).

Check BaseEncryptedField: https://github.com/django-extensions/django-extensions/blob/2.2.9/django_extensions/db/fields/encrypted.py (link to an older version because the field was removed in 3.0.0; see Issue #1359 for reason of deprecation)

Source: Django Custom Field: Only run to_python() on values from DB?

Upvotes: 11

Mark Chackerian
Mark Chackerian

Reputation: 23532

Since this question was originally answered, a number of packages have been written to solve this exact problem.

For example, as of 2018, the package django-encrypted-model-fields handles this with a syntax like

from encrypted_model_fields.fields import EncryptedCharField

class MyModel(models.Model):
    encrypted_char_field = EncryptedCharField(max_length=100)
    ...

As a rule of thumb, it's usually a bad idea to roll your own solution to a security challenge when a more mature solution exists out there -- the community is a better tester and maintainer than you are.

Upvotes: 2

John Peters
John Peters

Reputation: 1179

You need to add a to_python method that deals with a number of cases, including passing on an already decrypted value

(warning: snippet is cut from my own code - just for illustration)

def to_python(self, value):
    if not value:
        return
    if isinstance(value, _Param): #THIS IS THE PASSING-ON CASE
        return value
    elif isinstance(value, unicode) and value.startswith('{'):
        param_dict = str2dict(value)
    else:
        try:
            param_dict = pickle.loads(str(value))
        except:
            raise TypeError('unable to process {}'.format(value))
    param_dict['par_type'] = self.par_type
    classname = '{}_{}'.format(self.par_type, param_dict['rule'])
    return getattr(get_module(self.par_type), classname)(**param_dict)

By the way:

Instead of get_db_prep_value you should use get_prep_value (the former is for db specific conversions - see https://docs.djangoproject.com/en/1.4/howto/custom-model-fields/#converting-python-objects-to-query-values )

Upvotes: 1

Nathan Villaescusa
Nathan Villaescusa

Reputation: 17639

You should be overriding to_python, like the snippet did.

If you take a look at the CharField class you can see that it doesn't have a value_to_string method:

The docs say that the to_python method needs to deal with three things:

  • An instance of the correct type
  • A string (e.g., from a deserializer).
  • Whatever the database returns for the column type you're using.

You are currently only dealing with the third case.

One way to handle this is to create a special class for a decrypted string:

class DecryptedString(str):
   pass

Then you can detect this class and handle it in to_python():

def to_python(self, value):
    if isinstance(value, DecryptedString):
        return value

    decrypted = self.encrypter.decrypt(encrypted)
    return DecryptedString(decrypted)

This prevents you from decrypting more than once.

Upvotes: 4

Daniel Roseman
Daniel Roseman

Reputation: 599638

You forgot to set the metaclass:

class EncryptedCharField(models.CharField):
    __metaclass__ = models.SubfieldBase

The custom fields documentation explains why this is necessary.

Upvotes: 3

Related Questions