Mehdi bahmanpour
Mehdi bahmanpour

Reputation: 614

Django Bulk_create for inserting Excel into the postgresql database

Im using pandas in django app to read and process 3 excel files with about 1000 rows in each file .

Also I have 3 models with 10-15 fields. so i decided to make 3 arrays of objects and use bulk_create to insert them into the database .

The problem is it takes too long to process and most times nginx raises 504 timeout error .

Is this the correct way to insert excel to database ?

isn't it better to broke data in some parts then use bulk_create to insert them ? for example 10 parts with 100 data in each one .

Now server cpu is used 80-95 percent while processing and database engine is busy & engaged !!

Upvotes: 0

Views: 456

Answers (1)

Jason
Jason

Reputation: 11363

bulk_create is a good solution, but it has a few caveats, as I'm sure you've found.

First, it defaults to inserting all of the items in the array at one shot. If you're dumping 10k inserts at once with a number of indices, it'll require some time for the db to absorb and rebuild the indices. Alternatively, you can use the batch_size param to chunk the inserts into smaller pieces, which could be more agreeable by your db.

You may not have noticed the equivalent in the pandas docs, specifically chunksize

So chances are good the entire issue is you're overwhelming your db, and need to optimize how you send it the data for inserts.

Upvotes: 1

Related Questions