Axeva
Axeva

Reputation: 4737

Insert or Update to a table with no primary key

I have inherited a table with basic data that essentially looks like this:

person_id   first   last   activity_date
----------------------------------------
98765       Dan     Jones  2018-06-15
12345       Mary    Smith  2018-07-04
12345       Mary    Smith  2018-07-04
21212       Erin    Mark   2018-08-11

The two major challenges are that:

  1. It has no Primary Key, and
  2. There can be duplicate rows

A vendor will provide a CSV with updated data on a regular basis, and I need to add it to this existing table. This introduces two more challenges:

  1. The new file is not a complete replacement for the existing data in the table, and
  2. The new file may overlap part of the existing data in the table.

For example:

person_id   first   last   activity_date
----------------------------------------
12345       Mary    Smith  2018-07-04
12345       Mary    Smith  2018-07-04
21212       Erin    Mark   2018-08-11
87878       Tom     Davis  2018-08-12

So the question becomes, how can I update the table given all of these challenges?

My instinct is to create a duplicate table, add the new data, then use a query to identify the overlaps and the gaps. The lack of an EXCEPT command in MySQL makes this a challenge. So does the fact that there can be duplicate rows.

Can anyone suggest the SQL I may need to identify the missing rows from the CSV that I need to insert into the database?

Upvotes: 1

Views: 2013

Answers (1)

Nick
Nick

Reputation: 147146

I would create a new duplicate table as you describe, adding an auto_increment id field as a primary key:

CREATE TABLE table2
    (id int auto_increment primary key,
     `person_id` int,
     `first` varchar(20),
     `last` varchar(20), 
     `activity_date` date)

Create a unique key on this:

alter table table2 add unique key (`person_id`, `first`, `last`, `activity_date`);

And then insert all your data, using IGNORE to skip all duplicate data:

insert ignore into table2 (`person_id`, `first`, `last`, `activity_date`)
select * from table1

When you get the CSV file from your vendor, use LOAD DATA INFILE, again with the IGNORE option, and that will only import data from the CSV file that does not exist in the database.

Upvotes: 1

Related Questions