Reputation: 411
I have an excel file that has multiple rows which contain similar data. For example employee name is repeated in multiple rows but i would like to import such records only once and not multiple times into my database to avoid redundancy. I have seen that skip_rows method may help with this but still cannot figure how exactly to use it since the documentation is very limited. Any help will be appreciated :)
Upvotes: 0
Views: 1238
Reputation: 87
I found a way to skip the rows which is already present in the database. One way to do this is by comparing the particular field in the database with a column in the excel file.
So in this particular example, I am assuming that ig_username is a field in django model InstagramData and ig_username is also present in our excel file which I want to upload. So in this particular example ig_username value in the excel get skipped if it is already present in the database. I have suffered a lot for this answer, I don't want you to be 😊
class InstagramResource(resources.ModelResource):
def skip_row(self, instance, original):
check=[]
new=InstagramData.objects.all()
for p in new:
check.append(p.ig_username)
if instance.ig_username in check:
return True
else:
print("no")
return False
class Meta:
model = InstagramData
class InstagramDataAdminAdmin(ImportExportModelAdmin,admin.ModelAdmin):
resource_class = InstagramResource
admin.site.register(InstagramData,InstagramDataAdminAdmin)
Upvotes: 0
Reputation: 4306
One way to achieve this is to keep a list of already imported values (based on some identifier), and then override skip_row()
to ignore any duplicates.
For example:
class _BookResource(resources.ModelResource):
imported_names = set()
def after_import_row(self, row, row_result, row_number=None, **kwargs):
self.imported_names.add(row.get("name"))
def skip_row(self, instance, original):
return instance.name in self.imported_names
class Meta:
model = Book
fields = ('id', 'name', 'author_email', 'price')
Then running this will skip any duplicates:
# set up 2 unique rows and 1 duplicate
rows = [
('book1', '[email protected]', '10.25'),
('book2', '[email protected]', '10.25'),
('book1', '[email protected]', '10.25'),
]
dataset = tablib.Dataset(*rows, headers=['name', 'author_email', 'price'])
book_resource = _BookResource()
result = book_resource.import_data(dataset)
print(result.totals)
This gives the output:
OrderedDict([('new', 2), ('update', 0), ('delete', 0), ('skip', 1), ('error', 0), ('invalid', 0)])
Upvotes: 2