JanRainMan
JanRainMan

Reputation: 45

Django efficient bulk_create with unique constraint

I searched for this and there appears to be no really good solution(most answers are years old). Is there any new good solution for bulk_creating objects that need to be unique ?

Ok, so I have lists containing ~1000 dicts and unique constraint on dict['keyword']. So far I've been doing it this way:

self.get_existing_KeyO = \
list(KeyO.objects.filter(keyword__in=[x['keyword'] for x in self.data]).all())

And then I bulk_create those that are not already in database. I am using django 1.10 (because I need ID's of created objects)

I am doing this with celery (multiple threads) so there are conflicts(two threads adding to database at the same time). Could get_or_create be more efficient? I am slightly afraid it will crash DB as sometimes I am adding 5-10 lists at the same time, which would result in ~10 000 queries.

Upvotes: 3

Views: 1590

Answers (1)

Kevin Christopher Henry
Kevin Christopher Henry

Reputation: 48922

The best approach will depend on how likely collisions are. If they're rare, then an optimistic concurrency approach using bulk_create should work fine. Something like:

while True:
    existing = set(KeyO.objects.filter(keyword__in=[x['keyword'] for x in self.data])
                               .values_list("keyword", flat=True))

    try:
        KeyO.objects.bulk_create(KeyO(...) for x in self.data 
                                 if x['keyword'] not in existing)
    except IntegrityError:
         continue
    else:
         break

If collisions are common, then just using get_or_create in a loop should work fine. I wouldn't worry prematurely about performance problems until you actually encounter them.

Upvotes: 2

Related Questions