dl8
dl8

Reputation: 1270

Django fixture loading very slow

I'm trying to provide initial data using 2 sets of fixtures. The first fixture format looks like this.

  {
    "pk": 1,
    "model": "data.Person",
    "fields": {
      "full": "Anna-Varney",
      "num": "I",
      "short": "Anna-Varney"
    }
  },

And I load it in first, and it loads in fine in roughly 1-2 hours. My movie.json format looks like this:

  {
    "pk": 1,
    "model": "data.Film",
    "fields": {
      "date": "2005-08-01",
      "rating": 8.3,
      "actors": [
        [
          "Anna-Varney"
        ]
      ],
      "name": "Like a Corpse Standing in Desperation (2005) (V)"
    }
  },

And loading the movies fixture in has taken an extremely long time, it's currently 20 hrs in and my computer is sluggish while it is running. I loaded similar fixtures 2 months ago, except I used MySQL (I'm using Postgres now) and that I've added the date field in my model. When loading the movies fixture into my old MySQL database in the past, it only took 2-3 hours. Is there a way to determine what step the fixture loading part is in or if it has frozen?

For reference my models are:

class PersonManager(models.Manager):
    def get_by_natural_key(self, full):
        return self.get(full=full)

class Person(models.Model):
    objects = PersonManager()
    full = models.CharField(max_length=100,unique = True)
    short = models.CharField(max_length=100)
    num = models.CharField(max_length=5)
    def natural_key(self):
        return (self.full,)

    def __unicode__(self):
        return self.full


class Film(models.Model):
    name = models.TextField()
    date = models.DateField()
    rating = models.DecimalField(max_digits=3 , decimal_places=1)
    actors = models.ManyToManyField('Person')

    def __unicode__(self):
        return self.name

Upvotes: 4

Views: 5593

Answers (3)

Todd Ditchendorf
Todd Ditchendorf

Reputation: 11337

If you are loading your fixtures via the command line:

python manage.py loaddata --database=MY_DB_LABEL fixtures/my_fixture.json;

or perhaps programmatically through the shell:

os.system('python manage.py loaddata --database=%s fixtures/my_fixture.json;' % MY_DB_LABEL)

Fixture loading will be SLOW. (I have not investigated why. Presumably, there are many unnecessary intermediate database saves being made.)


SOLUTION: Switch to loading your fixtures programatically via python using a single transaction:

from django.db import transaction
from django.core.management import call_command

with transaction.atomic(using=MY_DB_LABEL):
    call_command('loaddata', 'fixtures/my_fixture.json', database=MY_DB_LABEL)
    call_command('loaddata', 'fixtures/my_other_fixture.json', database=MY_DB_LABEL)

The fixture loading will speed up DRAMATICALLY.


Note that the database and using parameters here are optional. If you are using a single database, they are unnecessary. But if you are using multiple databases like me, you will probably want to use it to ensure which database the fixture data is loaded into.

Upvotes: 2

jlivni
jlivni

Reputation: 4779

For most cases you can speed things up a lot by loading your dumped data programmatically and using bulk_create

Example:

from collections import defaultdict
from django.core import serializers                                                                     

obj_dict = defaultdict(list)
deserialized = serializers.deserialize('json', open('my_fixtures.json'))
# organize by model class
for item in deserialized:
  obj = item.object
  obj_dict[obj.__class__].append(obj) 

for cls, objs in obj_dict.items():
  cls.objects.bulk_create(objs)

Upvotes: 2

Tometzky
Tometzky

Reputation: 23920

Because Django runs in autocommit mode it asks a database to be really sure that after every single object is created then it would be immediately saved and synced to a physical location on a drive platter. This limits the number of objects saved to the speed of disk platters.

You need to use @transaction.atomic decorator or with transaction.atomic(): context manager to allow database to make sure everything is saved safely only once - at the end.

You can read more about transactions in Django documentation.

I'd even recommend setting ATOMIC_REQUESTS to True in database configuration when using PostgreSQL with Django. This way every browser request will automatically be served in one transaction and commited only if a resulting view would be successfully run.

Upvotes: 1

Related Questions