Reputation: 6659
Versions:
I have a large Django project where there's a setup script that adds a bunch of content to the database from some csv files. Once in a while, I need to reset everything, and re-add everything from these files. The data furthermore requires some post-processing once added. This however takes a while because the files are long and there's some unavoidable double loops in the code as well as many database queries.
In many cases, the tasks are independent, and thus they should be possible to run in parallel. I looked around for parallel processing libraries and decided to use the very simple multiprocessing.
Thus, the setup is quite simple. We define some function to run in parallel, and then call Pool
. Simplified code:
def some_func(input):
#code inserting data into Django here
pass
with Pool(4) as p:
p.map(some_func, [1, 2, 3, 4])
However, running the code results in database connection errors like these reported here, here, here:
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
It seems like the different threads/cores are trying to share one connection, or maybe the connection is not passed on to the workers.
How do I get parallel processing to work with Django database actions?
Upvotes: 2
Views: 3641
Reputation: 6659
After googling around, I was able to find an old (2009) related question on the Django Google groups:
Hi, I was recently debugging similar issue and came to a conclusion (which may be wrong of course :) that multiprocessing and Django DB connections don't play well together. I ended up closing Django DB connection first thing in the new process. It'll recreate a new connection when it needs one, but that one will have no references to the connection used by the parent.
So, my
Process.start()
calls a function which starts with:
from django.db import connection
connection.close()
This solved my problem.
Thus, to solve the issue, change the function to be something like this:
def some_func(input):
#kill old database connection
from django.db import connection
connection.close()
#code inserting data into Django here
pass
Then it worked fine.
Upvotes: 2