Reputation: 2590
I am using get_or_create to insert objects to database but the problem is that doing 1000 at once takes too long time.
I tried bulk_create but it doesn't provide functionality I need (creates duplicates, ignores unique value, doesn't trigger post_save signals I need).
Is it even possible to do get_or_create in bulk via customized sql query?
Here is my example code:
related_data = json.loads(urllib2.urlopen(final_url).read())
for item in related_data:
kw = item['keyword']
e, c = KW.objects.get_or_create(KWuser=kw, author=author)
e.project.add(id)
#Add m2m to parent project
related_data cotains 1000 rows looking like this:
[{"cmp":0,"ams":3350000,"cpc":0.71,"keyword":"apple."},
{"cmp":0.01,"ams":3350000,"cpc":1.54,"keyword":"apple -10810"}......]
KW model also sends signal I use to create another parent model:
@receiver(post_save, sender=KW)
def grepw(sender, **kwargs):
if kwargs.get('created', False):
id = kwargs['instance'].id
kww = kwargs['instance'].KWuser
# KeyO
a, b = KeyO.objects.get_or_create(defaults={'keyword': kww}, keyword__iexact=kww)
KW.objects.filter(id=id).update(KWF=a.id)
This works but as you can imagine doing thousands of rows at once takes long time and even crashes my tiny server, what bulk options do I have?
Upvotes: 22
Views: 18885
Reputation: 1099
As of Django 2.2, bulk_create has an ignore_conflicts
flag. Per the docs:
On databases that support it (all but Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values
Upvotes: 11
Reputation: 656391
If I understand correctly, "get_or_create
" means SELECT
or INSERT
on the Postgres side.
You have a table with a UNIQUE
constraint or index and a large number of rows to either INSERT
(if not yet there) and get the newly create ID or otherwise SELECT
the ID of the existing row. Not as simple as it may seem on the outside. With concurrent write load, the matter is even more complicated.
And there are various parameters that need to be defined (how to handle conflicts exactly):
Upvotes: 2
Reputation: 133
This post may be of use to you:
stackoverflow.com/questions/3395236/aggregating-saves-in-django
Note that the answer recommends using the commit_on_success decorator which is deprecated. It is replaced by the transaction.atomic decorator. Documentation is here:
from django.db import transaction
@transaction.atomic
def lot_of_saves(queryset):
for item in queryset:
modify_item(item)
item.save()
Upvotes: 5