Armaghan Fazal
Armaghan Fazal

Reputation: 73

How to bulk create or update in Django

I have to process an item report CSV file every 1 hour. The CSV contains 150k+ records for 1 account and there are multiple accounts in my system. I was working previously on rails and there was active record gem to handle this use case very efficiently. I am looking for an alternate to this gem in Django or any built in method that will be helpful to import such large data in bulk.

So far I have tried this code.

class ItemReportService:

    def call(self, file_url):
        with open(file_url, 'r') as file:
            reader = csv.DictReader(file)
            products = []
            for row in reader:
                product = self.process_product(row)
                products.append(product)

            self.update_products(products)

    def process_product(self, row):
        print(f'Processing sku: {row["SKU"]}')
        product = Product.objects.filter(
            sku=row['SKU']).first() or Product(sku=row['SKU'])
        product.listing_title = row['Product Name']
        product.listed_price = row['Price']
        product.buy_box_price = row['Buy Box Item Price'] + \
            row['Buy Box Shipping Price']
        product.status = row['Lifecycle Status']
        return product

    def update_products(self, products):
        Product.objects.bulk_update(
            products,
            [
                'listing_title',
                'listed_price',
                'buy_box_price',
                'Lifecycle Status'
            ]
        )

It is raising this exception because when there is a new product it doesn't have primary key assigned to it

ValueError: All bulk_update() objects must have a primary key set.

Upvotes: 4

Views: 10988

Answers (3)

Samuel
Samuel

Reputation: 91

Django 4.1 has new parameters for bulk_create(update_conflicts=bool and update_fields=[])

If your model has a field UNIQUE usually Django would ignore it when creating new data. But if you set the update_conflicts parameter to True, the fields inside update_fields will be updated.

Upvotes: 6

Rukamakama
Rukamakama

Reputation: 1150

I made this class function which can be used on any Django model in a project.

from django.db import models

class BaseModel(models.Model):
    @classmethod
    def bulk_create_or_update(
            cls, uniques: list[str],
            defaults: list[str],
            data: list[dict]
    ):
        # Get existing object list
        data_dict, select = {}, None
        for entry in data:
            sub_entry, key = {}, ''
            for uniq in uniques:
                sub_entry[uniq] = entry[uniq]
                key += str(entry[uniq])
            data_dict[key] = entry

            if not select:
                select = models.Q(**sub_entry)
                continue
            select |= models.Q(**sub_entry)

        records = cls.objects.filter(select).values('pk', *uniques)
        existing = {}
        for rec in records:
            key = ''
            for uniq in uniques:
                key += str(rec[uniq])
            existing[key] = rec

        # Split new objects from existing ones
        to_create, to_update = [], []
        for key, entry in data_dict.items():
            obj = cls(**entry)
            if key not in existing:
                to_create.append(obj)
                continue
            obj.pk = existing[key]['pk']
            to_update.append(obj)

        cls.objects.bulk_create(to_create, batch_size=1000)
        cls.objects.bulk_update(to_create, defaults, batch_size=1000)

Let take an usage example

class Product(BaseModel)
  price = models.IntegerField()
  name = models.CharField(max_length=128, unique=True)
  status = models.CharField(max_length=128)

if __name__ == '__main__':
  data = [
       {'price': 50, 'name': 'p1', 'status': 'New'},
       {'price': 33, 'name': 'p2', 'status': 'Old'}
  ]
  Product.bulk_create_or_update(uniques=['name'], defaults=['price', 'status'], data=data)

Any improvement suggestion of the code is welcome.

Upvotes: 0

fahad
fahad

Reputation: 18

You are not saving the product in the database before applying bulk_update. I have checked your code for this purpose, you can use bulk_insert with an additional parameter

Model.objects.bulk_create(self.data, ignore_conflicts=True)

or

columns = ['column1', 'column2']
obj = Model.objects.filter(column1="sku").first()
if not obj:
   obj = Model.objects.create(column1="sku")
obj.column1 = row["column1"] or obj.column1
obj.column2 = row["column2"] or obj.column2
items_to_be_inserted.append(obj)

In the end, you can do bulk update like

Model.objects.bulk_update(items_to_be_inserted, columns)

This will solve your problem.

Upvotes: 0

Related Questions