Reputation: 1297
I have no knowledge of what I'm trying to understand, surfing the internet brought me here and now I need this in my code.
I use django-rest-framework, gunicorn, and Nginx.
Suppose I have 3 workers process of gunicorn setup.
and I have a very simple view that reads a value from the database, performs a different task that takes around 1 second, increments the value by 1, and saves it back to the database.
class CreateView():
value = MyModel.objects.get(id=1).integerValueField
otherTask() #takes around 1 second (assume)
updatedValue = value + 1
MyModel.objects.filter(id=1).update(integerValueField=updatedValue)
return
Will this always work?
what if a different worker process of gunicorn is handling the request of concurrent users? If the database is updated (integerValueField field) by a different process in between reading the value and updating the value by some other worker process? Is this locked somehow to maintain integrity?
if I can get valid links to read more about the topic, will work well for me.
Upvotes: 7
Views: 2815
Reputation: 560
To expand on The Pjot‘s comment - no, the code you provided won't work reliably if you execute it with multiple Gunicorn workers. What happens here is called a race condition and isn't actually anything that is specific to Django - here is a discussion of exactly this in a more general database setting.
Now, what would happen in your specific case if multiple Gunicorn workers access the same object (or a single worker with multiple threads) looks roughly like this if we assume that MyModel.objects.get(id=1).integerValueField
is 100
at the beginning:
value = MyModel.objects.get(id=1).integerValueField
, which will hit the database and retrieve the object with primary key 1
and store it in memory. value
will be set to the value of integerValueField
, which is 100
in our exampleotherTask()
value = MyModel.objects.get(id=1).integerValueField
and just as worker 1 it will store the current value of integerValueField
in the database in value
. value
, again, will be 100
as the value hasn't yet changed in the databaseotherTask()
updatedValue = value + 1
, which sets updatedValue
to 101
and then execute MyModel.objects.filter(id=1).update(integerValueField=updatedValue)
to save it to the databaseupdatedValue = value + 1
as well - but value
will be also 101
instead of 102
, as the local copy of the database value is used. No additional database access happens here. After that it will execute MyModel.objects.filter(id=1).update(integerValueField=updatedValue)
as well, which will update the database, but won't change the value - it still will be 101
What select_for_update
does is it locks the database row so it cannot be accessed by any other worker at the same time (this is a concept that is called mutual exclusive access and is often implemented through locking). This will solve your issue of lost updates. However, what you should consider here is that you will block all access to this row while the otherTask()
is running (which is apparently a substantial time) and this can easily lead to long delays for your clients and worse.
I'd really consider if there isn't a better way to solve this. If not I'd at least look into multi-threaded Gunicorn workers - here is a good discussion.
Upvotes: 5