Reputation: 45
I searched for this and there appears to be no really good solution(most answers are years old). Is there any new good solution for bulk_creating objects that need to be unique ?
Ok, so I have lists containing ~1000 dicts and unique constraint on dict['keyword']. So far I've been doing it this way:
self.get_existing_KeyO = \
list(KeyO.objects.filter(keyword__in=[x['keyword'] for x in self.data]).all())
And then I bulk_create those that are not already in database. I am using django 1.10 (because I need ID's of created objects)
I am doing this with celery (multiple threads) so there are conflicts(two threads adding to database at the same time). Could get_or_create be more efficient? I am slightly afraid it will crash DB as sometimes I am adding 5-10 lists at the same time, which would result in ~10 000 queries.
Upvotes: 3
Views: 1590
Reputation: 48922
The best approach will depend on how likely collisions are. If they're rare, then an optimistic concurrency approach using bulk_create
should work fine. Something like:
while True:
existing = set(KeyO.objects.filter(keyword__in=[x['keyword'] for x in self.data])
.values_list("keyword", flat=True))
try:
KeyO.objects.bulk_create(KeyO(...) for x in self.data
if x['keyword'] not in existing)
except IntegrityError:
continue
else:
break
If collisions are common, then just using get_or_create
in a loop should work fine. I wouldn't worry prematurely about performance problems until you actually encounter them.
Upvotes: 2