Reputation: 73
I have to process an item report CSV file every 1 hour. The CSV contains 150k+ records for 1 account and there are multiple accounts in my system. I was working previously on rails and there was active record gem to handle this use case very efficiently. I am looking for an alternate to this gem in Django or any built in method that will be helpful to import such large data in bulk.
So far I have tried this code.
class ItemReportService:
def call(self, file_url):
with open(file_url, 'r') as file:
reader = csv.DictReader(file)
products = []
for row in reader:
product = self.process_product(row)
products.append(product)
self.update_products(products)
def process_product(self, row):
print(f'Processing sku: {row["SKU"]}')
product = Product.objects.filter(
sku=row['SKU']).first() or Product(sku=row['SKU'])
product.listing_title = row['Product Name']
product.listed_price = row['Price']
product.buy_box_price = row['Buy Box Item Price'] + \
row['Buy Box Shipping Price']
product.status = row['Lifecycle Status']
return product
def update_products(self, products):
Product.objects.bulk_update(
products,
[
'listing_title',
'listed_price',
'buy_box_price',
'Lifecycle Status'
]
)
It is raising this exception because when there is a new product it doesn't have primary key assigned to it
ValueError: All bulk_update() objects must have a primary key set.
Upvotes: 4
Views: 10988
Reputation: 91
Django 4.1 has new parameters for bulk_create(update_conflicts=bool and update_fields=[])
If your model has a field UNIQUE usually Django would ignore it when creating new data. But if you set the update_conflicts parameter to True, the fields inside update_fields will be updated.
Upvotes: 6
Reputation: 1150
I made this class function which can be used on any Django model in a project.
from django.db import models
class BaseModel(models.Model):
@classmethod
def bulk_create_or_update(
cls, uniques: list[str],
defaults: list[str],
data: list[dict]
):
# Get existing object list
data_dict, select = {}, None
for entry in data:
sub_entry, key = {}, ''
for uniq in uniques:
sub_entry[uniq] = entry[uniq]
key += str(entry[uniq])
data_dict[key] = entry
if not select:
select = models.Q(**sub_entry)
continue
select |= models.Q(**sub_entry)
records = cls.objects.filter(select).values('pk', *uniques)
existing = {}
for rec in records:
key = ''
for uniq in uniques:
key += str(rec[uniq])
existing[key] = rec
# Split new objects from existing ones
to_create, to_update = [], []
for key, entry in data_dict.items():
obj = cls(**entry)
if key not in existing:
to_create.append(obj)
continue
obj.pk = existing[key]['pk']
to_update.append(obj)
cls.objects.bulk_create(to_create, batch_size=1000)
cls.objects.bulk_update(to_create, defaults, batch_size=1000)
Let take an usage example
class Product(BaseModel)
price = models.IntegerField()
name = models.CharField(max_length=128, unique=True)
status = models.CharField(max_length=128)
if __name__ == '__main__':
data = [
{'price': 50, 'name': 'p1', 'status': 'New'},
{'price': 33, 'name': 'p2', 'status': 'Old'}
]
Product.bulk_create_or_update(uniques=['name'], defaults=['price', 'status'], data=data)
Any improvement suggestion of the code is welcome.
Upvotes: 0
Reputation: 18
You are not saving the product in the database before applying bulk_update. I have checked your code for this purpose, you can use bulk_insert with an additional parameter
Model.objects.bulk_create(self.data, ignore_conflicts=True)
or
columns = ['column1', 'column2']
obj = Model.objects.filter(column1="sku").first()
if not obj:
obj = Model.objects.create(column1="sku")
obj.column1 = row["column1"] or obj.column1
obj.column2 = row["column2"] or obj.column2
items_to_be_inserted.append(obj)
In the end, you can do bulk update like
Model.objects.bulk_update(items_to_be_inserted, columns)
This will solve your problem.
Upvotes: 0