devcloud
devcloud

Reputation: 411

How to avoid similar rows during excel import with django-import-export?

I have an excel file that has multiple rows which contain similar data. For example employee name is repeated in multiple rows but i would like to import such records only once and not multiple times into my database to avoid redundancy. I have seen that skip_rows method may help with this but still cannot figure how exactly to use it since the documentation is very limited. Any help will be appreciated :)

Upvotes: 0

Views: 1238

Answers (2)

Rishabh Gupta
Rishabh Gupta

Reputation: 87

I found a way to skip the rows which is already present in the database. One way to do this is by comparing the particular field in the database with a column in the excel file.

So in this particular example, I am assuming that ig_username is a field in django model InstagramData and ig_username is also present in our excel file which I want to upload. So in this particular example ig_username value in the excel get skipped if it is already present in the database. I have suffered a lot for this answer, I don't want you to be 😊

    class InstagramResource(resources.ModelResource):
      def skip_row(self, instance, original):
        check=[]
        new=InstagramData.objects.all()
        for p in new:
            check.append(p.ig_username)
        if instance.ig_username in check:
            return True
        else:
            print("no")
            return False
      class Meta:
        model = InstagramData

  class InstagramDataAdminAdmin(ImportExportModelAdmin,admin.ModelAdmin):
    resource_class = InstagramResource
  admin.site.register(InstagramData,InstagramDataAdminAdmin)

Upvotes: 0

Matthew Hegarty
Matthew Hegarty

Reputation: 4306

One way to achieve this is to keep a list of already imported values (based on some identifier), and then override skip_row() to ignore any duplicates.

For example:

class _BookResource(resources.ModelResource):

    imported_names = set()

    def after_import_row(self, row, row_result, row_number=None, **kwargs):
        self.imported_names.add(row.get("name"))

    def skip_row(self, instance, original):
        return instance.name in self.imported_names

    class Meta:
        model = Book
        fields = ('id', 'name', 'author_email', 'price')

Then running this will skip any duplicates:

    # set up 2 unique rows and 1 duplicate
    rows = [
        ('book1', '[email protected]', '10.25'),
        ('book2', '[email protected]', '10.25'),
        ('book1', '[email protected]', '10.25'),
    ]
    dataset = tablib.Dataset(*rows, headers=['name', 'author_email', 'price'])

    book_resource = _BookResource()
    result = book_resource.import_data(dataset)
    print(result.totals)

This gives the output:

OrderedDict([('new', 2), ('update', 0), ('delete', 0), ('skip', 1), ('error', 0), ('invalid', 0)])

Upvotes: 2

Related Questions