How to plug in a specific validator for all cases of a built-in type?

Question

I recently noticed that some of my entries in a database coming from users contain incorrectly encoded strings, such as Ã³ when ó was clearly meant. It's coming from copy-pasting of other websites that aren't properly encoded, which is beyond my control. I discovered that I can add this validator to catch such cases and raise an exception - here's an example with an attached model:

from django.db import models

from django.utils.translation import gettext_lazy as _
from django.core.exceptions import ValidationError
import ftfy

def validate_ftfy(value):
    value_ftfy = ftfy.ftfy(value)
    if value_ftfy != value:
        raise ValidationError(
            _('Potential UTF-8 encoding error: %(value)r'
              ' decoded to %(value_ftfy)r.'),
            params={'value': value, 'value_ftfy': value_ftfy}
        )

class Message(models.Model):
    content = models.CharField(max_length=1000, validators=[validate_ftfy])

    def save(self, *args, **kwargs):
        self.full_clean()
        return super(Message, self).save(*args, **kwargs)

The problem is that now that I discovered it, I see no point skipping it in any of my instances of CharField, TextField and the like. Is there a way to plug in this validator to all data types, so that if anything non-binary has invalid UTF-8, I can count on it not making it to the database?

How to plug in a specific validator for all cases of a built-in type?

Answers (1)

Related Questions