Reputation: 1270
I'm trying to provide initial data using 2 sets of fixtures. The first fixture format looks like this.
{
"pk": 1,
"model": "data.Person",
"fields": {
"full": "Anna-Varney",
"num": "I",
"short": "Anna-Varney"
}
},
And I load it in first, and it loads in fine in roughly 1-2 hours. My movie.json format looks like this:
{
"pk": 1,
"model": "data.Film",
"fields": {
"date": "2005-08-01",
"rating": 8.3,
"actors": [
[
"Anna-Varney"
]
],
"name": "Like a Corpse Standing in Desperation (2005) (V)"
}
},
And loading the movies fixture in has taken an extremely long time, it's currently 20 hrs in and my computer is sluggish while it is running. I loaded similar fixtures 2 months ago, except I used MySQL (I'm using Postgres now) and that I've added the date field in my model. When loading the movies fixture into my old MySQL database in the past, it only took 2-3 hours. Is there a way to determine what step the fixture loading part is in or if it has frozen?
For reference my models are:
class PersonManager(models.Manager):
def get_by_natural_key(self, full):
return self.get(full=full)
class Person(models.Model):
objects = PersonManager()
full = models.CharField(max_length=100,unique = True)
short = models.CharField(max_length=100)
num = models.CharField(max_length=5)
def natural_key(self):
return (self.full,)
def __unicode__(self):
return self.full
class Film(models.Model):
name = models.TextField()
date = models.DateField()
rating = models.DecimalField(max_digits=3 , decimal_places=1)
actors = models.ManyToManyField('Person')
def __unicode__(self):
return self.name
Upvotes: 4
Views: 5593
Reputation: 11337
If you are loading your fixtures via the command line:
python manage.py loaddata --database=MY_DB_LABEL fixtures/my_fixture.json;
or perhaps programmatically through the shell:
os.system('python manage.py loaddata --database=%s fixtures/my_fixture.json;' % MY_DB_LABEL)
Fixture loading will be SLOW. (I have not investigated why. Presumably, there are many unnecessary intermediate database saves being made.)
SOLUTION: Switch to loading your fixtures programatically via python using a single transaction:
from django.db import transaction
from django.core.management import call_command
with transaction.atomic(using=MY_DB_LABEL):
call_command('loaddata', 'fixtures/my_fixture.json', database=MY_DB_LABEL)
call_command('loaddata', 'fixtures/my_other_fixture.json', database=MY_DB_LABEL)
The fixture loading will speed up DRAMATICALLY.
Note that the database
and using
parameters here are optional. If you are using a single database, they are unnecessary. But if you are using multiple databases like me, you will probably want to use it to ensure which database the fixture data is loaded into.
Upvotes: 2
Reputation: 4779
For most cases you can speed things up a lot by loading your dumped data programmatically and using bulk_create
Example:
from collections import defaultdict
from django.core import serializers
obj_dict = defaultdict(list)
deserialized = serializers.deserialize('json', open('my_fixtures.json'))
# organize by model class
for item in deserialized:
obj = item.object
obj_dict[obj.__class__].append(obj)
for cls, objs in obj_dict.items():
cls.objects.bulk_create(objs)
Upvotes: 2
Reputation: 23920
Because Django runs in autocommit mode it asks a database to be really sure that after every single object is created then it would be immediately saved and synced to a physical location on a drive platter. This limits the number of objects saved to the speed of disk platters.
You need to use @transaction.atomic
decorator or with transaction.atomic():
context manager to allow database to make sure everything is saved safely only once - at the end.
You can read more about transactions in Django documentation.
I'd even recommend setting ATOMIC_REQUESTS
to True
in database configuration when using PostgreSQL with Django. This way every browser request will automatically be served in one transaction and commited only if a resulting view would be successfully run.
Upvotes: 1