Josh English
Josh English

Reputation: 532

Django import-export duplicating rows when importing same file multiple times

I am building a tool that allows users to upload CSV files to update the database using the django-import-export tool. I uploaded my test CSV file with one data row, then uploaded it again and got a duplicated row (with a new primary key but all the other values are the same). The row.import_type value is "updated" but the only thing that is updated is the id.

Then I upload the same file a third time and get an error:

app.models.Role.MultipleObjectsReturned: get() returned more than one Role -- it returned 2!

(I really appreciate the exclamation point in that error message, by the way.)

Ideally I would get a skipped row on the second import and third import of the file. I suppose I'd be okay with an error. The file's contents are:

    Sales Role,System Role,System Plan,id
    Sales Rep,Account Executive,951M-NA,

This is the format users get when they export the csv dataset. Ideally they would export a file, change a few columns (aside from the name which is the import_id_field), and re-upload the data.

In app/resources.py:

    class RoleResourec(resources.ModelResource):
        name = Field(attribute='name', column_name="Sales Role")
        default_role = Field(attribute='default_role', column_name="System Role")
        default_plan = Field(attribute='default_plan', column_name="System Plan")
    
        class Meta:
            models=Role
            fields= ('id', 'name', 'default_role', 'default_plan')
            import_id_fields = ('name',)
            skip_unchanged = True

From what I can tell, on the second import, the get_or_init_instance() method isn't finding the object from the first import, but then does find them on the third. I haven't done anything to the resource to customize the import workflow as described in Import data workflow page.

What's going wrong here? Do I need to customize the import workflow or did I miss yet-another required attribute in the Resource?

Upvotes: 0

Views: 1456

Answers (2)

Matthew Hegarty
Matthew Hegarty

Reputation: 4231

The logic will only skip the row if all declared fields are same in both imported row and the persisted object. If any field is different, then an update will be performed.

For this to work, the fields you declare in import_id_fields have to be a unique match for a row, otherwise you will get MultipleObjectsReturned.

In your case, if duplicate rows are being created, then it must mean that name is not present in the db on the second run. I assume that you have not overridden ModelInstanceLoader or are running in bulk mode, because this would disrupt the skip row logic.

By default, import_id_fields is set to the row id, so if you can include this in your export then you are guaranteed to have a unique row. Obviously the users should not change this field, otherwise you will get duplicates.

The MultipleObjectsReturned error comes from here, and it's simply a call to Role.objects.get(name=<n>).

Upvotes: 1

Josh English
Josh English

Reputation: 532

Here's the surprisingly simple solution:


    class RoleResourec(resources.ModelResource):
            name = Field(attribute='name', column_name="Sales Role")
            default_role = Field(attribute='default_role', column_name="System Role")
            default_plan = Field(attribute='default_plan', column_name="System Plan")
        
            class Meta:
                models=Role
                fields= ('name', 'default_role', 'default_plan')
                import_id_fields = ('name',)
                skip_unchanged = True

All I had to do was remove the 'id' from the fields list in the Meta class and now I get the expected behavior.

I can export this file to CSV (and the id column does not appear), edit the list, and re-upload and the system skips and updates and even adds new things as necessary.

Upvotes: 0

Related Questions