bdoubleu
bdoubleu

Reputation: 6107

How to make data migration function more efficient

I'm cleaning up an older PostgreSQL database so that null isn't being used on string-based fields. To do this I'm using a Django data migration which works on my local machine (although it's a bit slow) but the migration is being killed on production.

All database connections are closed during the migration so I'm guessing it uses a ton of memory. I've tried doing this in the shell manually but the process is still being killed. Same with increasing the batch size.

Is there a more efficient way of executing this function that won't use a ton of resources?

from django.db import models

def set_null_char_field_to_empty_string(apps, schema_editor):
    model_classes = [
        apps.get_model('order', 'Order'),
    ]
    for model in model_classes:
        char_fields = [f.name for f in model._meta.get_fields()
                       if isinstance(f, models.CharField)]
        if char_fields:
            filter_args = models.Q()
            for field in char_fields:
                filter_args |= models.Q(**{f'{field}__isnull': True})
            objs = model.objects.filter(filter_args)

            objs_to_update = []
            for obj in objs:
                for field in char_fields:
                    setattr(obj, field, getattr(obj, field) or '')
                objs_to_update.append(obj)
            model.objects.bulk_update(
                objs_to_update,
                fields=char_fields,
                batch_size=500
            )

Upvotes: 0

Views: 206

Answers (2)

Kevin Christopher Henry
Kevin Christopher Henry

Reputation: 48962

You should be able to accomplish this entirely on the database, without fetching any data at all. Just take one CharField at a time and construct a query to update the values in place. I haven't tested this, but something like:

from django.db import models

def set_null_char_field_to_empty_string(apps, schema_editor):
    model_classes = [
        apps.get_model('order', 'Order'),
    ]
    for model in model_classes:
        char_fields = [f.name for f in model._meta.get_fields()
                       if isinstance(f, models.CharField)]
        for field in char_fields:
            model.objects.filter(**{f'{field}__isnull': True})
                         .update(**{field: ''})

Upvotes: 1

Igor Moraru
Igor Moraru

Reputation: 7729

The only method I know to avoid storing objects for update in memory, is using select_for_update(), although I don't know how it behaves with large amount of data.

def set_null_char_field_to_empty_string(apps, schema_editor):
    model_classes = [
        apps.get_model('order', 'Order'),
    ]
    for model in model_classes:
        char_fields = [f.name for f in model._meta.get_fields() if isinstance(f, models.CharField)]
        if char_fields:
            filter_args = models.Q()
            for field in char_fields:
                filter_args |= models.Q(**{f'{field}__isnull': True})
            objs = model.objects.select_for_update().filter(filter_args)
            with transaction.atomic():
                for obj in objs:
                    for field in char_fields:
                        setattr(obj, field, getattr(obj, field) or '')

Upvotes: 0

Related Questions