How to avoid similar rows during excel import with django-import-export?

Question

I have an excel file that has multiple rows which contain similar data. For example employee name is repeated in multiple rows but i would like to import such records only once and not multiple times into my database to avoid redundancy. I have seen that skip_rows method may help with this but still cannot figure how exactly to use it since the documentation is very limited. Any help will be appreciated :)

Matthew Hegarty · Accepted Answer

One way to achieve this is to keep a list of already imported values (based on some identifier), and then override skip_row() to ignore any duplicates.

For example:

class _BookResource(resources.ModelResource):

    imported_names = set()

    def after_import_row(self, row, row_result, row_number=None, **kwargs):
        self.imported_names.add(row.get("name"))

    def skip_row(self, instance, original):
        return instance.name in self.imported_names

    class Meta:
        model = Book
        fields = ('id', 'name', 'author_email', 'price')

Then running this will skip any duplicates:

    # set up 2 unique rows and 1 duplicate
    rows = [
        ('book1', 'email@example.com', '10.25'),
        ('book2', 'email@example.com', '10.25'),
        ('book1', 'email@example.com', '10.25'),
    ]
    dataset = tablib.Dataset(*rows, headers=['name', 'author_email', 'price'])

    book_resource = _BookResource()
    result = book_resource.import_data(dataset)
    print(result.totals)

This gives the output:

OrderedDict([('new', 2), ('update', 0), ('delete', 0), ('skip', 1), ('error', 0), ('invalid', 0)])

How to avoid similar rows during excel import with django-import-export?

Answers (2)

Related Questions