Reputation: 11561
I recently noticed that some of my entries in a database coming from users contain incorrectly encoded strings, such as ó
when ó
was clearly meant. It's coming from copy-pasting of other websites that aren't properly encoded, which is beyond my control. I discovered that I can add this validator to catch such cases and raise an exception - here's an example with an attached model:
from django.db import models
from django.utils.translation import gettext_lazy as _
from django.core.exceptions import ValidationError
import ftfy
def validate_ftfy(value):
value_ftfy = ftfy.ftfy(value)
if value_ftfy != value:
raise ValidationError(
_('Potential UTF-8 encoding error: %(value)r'
' decoded to %(value_ftfy)r.'),
params={'value': value, 'value_ftfy': value_ftfy}
)
class Message(models.Model):
content = models.CharField(max_length=1000, validators=[validate_ftfy])
def save(self, *args, **kwargs):
self.full_clean()
return super(Message, self).save(*args, **kwargs)
The problem is that now that I discovered it, I see no point skipping it in any of my instances of CharField, TextField and the like. Is there a way to plug in this validator to all data types, so that if anything non-binary has invalid UTF-8, I can count on it not making it to the database?
Upvotes: 0
Views: 35
Reputation: 4150
There is no hook to add additional validators to built-in fields and I'm not sure it's a good idea as they are used elsewhere in the Django core.
I think the best option for you is to define a custom field with the validation already applied, and use it in alternative to CharField
, eg:
class FtfyCharField(CharField):
default_validators = [validate_ftfy]
class Message(models.Model):
content = FtfyCharField(max_length=1000)
If you wanted to apply it to other types of field you could implement it as a mixin, eg:
class FtfyFieldMixin(models.Field):
default_validators = [validate_ftfy]
class FtfyCharField(models.CharField, FtfyFieldMixin):
pass
class FtfyTextField(models.TextField, FtfyFieldMixin):
pass
Upvotes: 1