d6bels
d6bels

Reputation: 1482

Django ValidationError when deleting queryset

I am writing a Django command to delete data older than x days from my app.

The filtering is made using the following:

qs = Data.objects.filter(date_created__lte=timezone.now()-timedelta(days=days_del))

With days_del being an integer and date_created a DateTimeField.

When trying to either print this queryset, or calling .delete() on it result in a JSONDecodeError and ValidationError. I don't really get why this happens or how I can prevent it trying to decode JSON fileds in this case.

Note that I am using the jsonfield pypi package and the Data model has a JSONField.

It is possible that some data are stalled and causing the problem (see traceback), is there a way to ignore the validation and proceed with the deletion anyway?

Traceback (most recent call last):
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/jsonfield/fields.py", line 83, in pre_init
    return json.loads(value, **self.load_kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 464 (char 463)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
    utility.execute()
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/core/management/__init__.py", line 356, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/webapps/myproj/server/mirrors/management/commands/data_cleanup.py", line 39, in handle
    qs.delete()
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/query.py", line 616, in delete
    collector.collect(del_query)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/deletion.py", line 191, in collect
    reverse_dependency=reverse_dependency)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/deletion.py", line 89, in add
    if not objs:
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/query.py", line 254, in __bool__
    self._fetch_all()
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/query.py", line 1118, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/query.py", line 63, in __iter__
    obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end])
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/base.py", line 583, in from_db
    new = cls(*values)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/base.py", line 502, in __init__
    _setattr(self, field.attname, val)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/jsonfield/subclassing.py", line 43, in __set__
    obj.__dict__[self.field.name] = self.field.pre_init(value, obj)
  File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/jsonfield/fields.py", line 85, in pre_init
    raise ValidationError(_("Enter valid JSON"))
django.core.exceptions.ValidationError: ['Enter valid JSON']

I am deleting a lot of data at the same time, maybe there's a better way to handle this. Anyway fixing the old staled data is not an option here.

Thanks


Here is the command file:

from django.core.management.base import BaseCommand, CommandError
from oauth2_provider.models import Application
from django.utils import timezone

import pytz
from datetime import timedelta

from confluence_core.models import Data



class Command(BaseCommand):
    help = 'Delete data older than given days'

    def add_arguments(self, parser):
        parser.add_argument("-d", "--days", type=int, dest='days', required=True, help="Days limit")
        parser.add_argument("-c", "--confirm", action='store_true', dest='confirm', default=False, required=False, help="Confirm before deletion")

    def handle(self, *args, **options):

        days_del = options['days']
        do_delete = False

        qs = Data.objects.filter(date_created__lte=timezone.now()-timedelta(days=days_del))

        if qs.count() > 0:

            if options['confirm']:
                print(f"{qs.count():,} data entries will be deleted.")
                ret = input("Confirm ? (y/n)\n")
                if ret in ['y', 'Y', 'yes']:
                    do_delete = True
            else:
                do_delete = True

            if do_delete is True:
                print(f"Deleting {qs.count():,} data entries...")
                qs.delete()
            else:
                print("Not deleting anything.")

        else:
            print("No data to delete.")

        print("Done.")

Upvotes: 1

Views: 514

Answers (1)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

Given that the ORM force loads the queryset when deleting, cf:

File "/root/virtualenvs/myproj-prod/lib/python3.6/site-packages/django/db/models/deletion.py", line 89, in add
if not objs:

the only workaround I can think of (that does not require forking or monkeypatching anything) would be to first update the whole queryset so the jsonfield is set to something valid, ie:

qs.update(name_of_the_field={})
qs.delete()

But this won't preven other issues with your data being inconsistent, so the real solution is obviously to clean up your whole dataset.

Upvotes: 2

Related Questions