glormph
glormph

Reputation: 1004

Django race condition: code does not fetch newly created records

I am running Django (1.11.20 with ATOMIC_REQUESTS: True) and Postgres, and have a Django view that basically does:

job = Job(name='hello')
job.save()
files = JobFile.objects.create(job_id=job.id, file='myfile.txt')

Most of the time, this works fine. But every so often, another process (that is executed by cron) checks the JobFile table and finds it not containing a record for a given existing Job, when doing:

jobs = Job.objects.filter(timestamp=my_timestamp)
job_files = {}
for jf in JobFile.objects.filter(job__in=jobs):
  try:
    job_files[jf.job_id].add(jf.file)
  except KeyError:
    job_files[jf.job_id] = set([jf.file])

for job in jobs:
  files = job_files[job.id] if job.id in job_files else set()
  print('Job {} found with files {}'.format(job.id, files))

# output when this problem occurs is typically:
# Job found 123 with files set()

It reports it found a Job without a JobFile in the log and errors, and when I check somewhat later the DB contains a JobFile just fine.

I am scratching my head as to why it doesnt find JobFile records, and now I investigated and found that in the lastest occurrences of this problem, the cron process initiated about 0.1s before record creation and finished closely after, which made me suspicious that there is some sort of timing problem. But my (limited) understanding would be that when this is all in a single view, ATOMIC_REQUESTS would ensure that both objects exist. My other suspect is the create method but that seems to return nicely after calling save according to its source. What am I missing?

Upvotes: 1

Views: 163

Answers (1)

Alasdair
Alasdair

Reputation: 308839

It looks like you have a race condition. Django and Postgres use READ COMMITTED isolation level, which means that queries in your cron job will see new objects once the transactions have been committed.

I've added comments to your code to explain the issue

# This doesn't cause a query because Django querysets are lazy
jobs = Job.objects.filter(timestamp=my_timestamp)
job_files = {}
# This fetches the jobfiles for the jobs existing at this point
for jf in JobFile.objects.filter(job__in=jobs):
  try:
    job_files[jf.job_id].add(jf.file)
  except KeyError:
    job_files[jf.job_id] = set([jf.file])

# During the loop above, extra jobs and jobfiles are saved to the database

# This line causes the jobs queryset to be evaluated. It includes the new jobs
for job in jobs:
  files = job_files[job.id] if job.id in job_files else set()
  print('Job {} found with files {}'.format(job.id, files))

You can avoid the error by using list() to force the jobs queryset to be evaluated at the beginning of the script.

jobs = list(Job.objects.filter(timestamp=my_timestamp))

Upvotes: 2

Related Questions