gigaTitan
gigaTitan

Reputation: 21

Increasing INSERT Performance in Django For Many Records of HUGE Data

So I've been trying to solve this for a while now and can't seem to find a way to speed up performance of inserts with Django despite the many suggestions and tips found on StackOverflow and many Google searches.

So basically I need to insert a LOT of data records (~2 million) through Django into my MySQL DB, each record entry being a whopping 180KB. I've scaled my testing down to 2,000 inserts yet still cant get the running time down to a reasonable amount. 2,000 inserts currently takes approximately 120 seconds.

So I've tried ALL of the following (and many combinations of each) to no avail:

Apologizes if I forgot to list something, but I've just tried so many different things at this point, I can't even keep track ahah.

Would greatly appreciate a little help here in speeding up performance from someone who maybe had to perform a similar task with Django database insertions.

Please let me know if I've left out any necessary information!

Upvotes: 2

Views: 3346

Answers (3)

gigaTitan
gigaTitan

Reputation: 21

So I found that editing the mysql /etc/mysql/my.cnf file and configuring some of the InnoDB settings significantly increased performance.

I set:

  • innodb_buffer_pool_size = 9000M 75% of your system RAM
  • innodb_log_file_size = 2000M 20%-30% of the above value

restarted the mysql server and this cut down 50 inserts from ~3 seconds to ~0.8 seconds. Not too bad!

Now I'm noticing the inserts are gradually taking longer longer for big data amounts. 50 inserts starts at about 0.8 seconds but after 100 or so batches the average is up to 1.4 seconds and continues increasing.

Will report back if solved.

Upvotes: 0

Camilo Torres
Camilo Torres

Reputation: 488

You can also try to delete any index on the tables (and any other constraint), the recreate the indexes and constraints after the insert.

Updating indexes and checking constraints can slow down every insert.

Upvotes: 0

Christine Schäpers
Christine Schäpers

Reputation: 1318

This is out of django's scope really. Django just translates your python into on INSERT INTO statement. For most performance on the django layer skipping it entirely (by doing sql raw) might be best, even though python processing is pretty fast compared to IO of a sql-database.

You should rather focus on the database. I'm a postgres person, so I don't know what config options mysql has, but there is probably some fine tuning available.
If you have done that and there is still no increase you should consider using SSDs, SSDs in a RAID 0, or even a db in memory, to skip IO times. Sharding may be a solution too - splitting the tasks and executing them in parallel.

If the inserts however are not time critical, i.e. can be done whenever, but shouldn't block the page from loading, I recommend celery. There you can queue a task to be executed whenever there is time - asynchronously.

Upvotes: 1

Related Questions