Reputation: 4737
I have inherited a table with basic data that essentially looks like this:
person_id first last activity_date ---------------------------------------- 98765 Dan Jones 2018-06-15 12345 Mary Smith 2018-07-04 12345 Mary Smith 2018-07-04 21212 Erin Mark 2018-08-11
The two major challenges are that:
A vendor will provide a CSV with updated data on a regular basis, and I need to add it to this existing table. This introduces two more challenges:
For example:
person_id first last activity_date ---------------------------------------- 12345 Mary Smith 2018-07-04 12345 Mary Smith 2018-07-04 21212 Erin Mark 2018-08-11 87878 Tom Davis 2018-08-12
So the question becomes, how can I update the table given all of these challenges?
My instinct is to create a duplicate table, add the new data, then use a query to identify the overlaps and the gaps. The lack of an EXCEPT
command in MySQL makes this a challenge. So does the fact that there can be duplicate rows.
Can anyone suggest the SQL I may need to identify the missing rows from the CSV that I need to insert into the database?
Upvotes: 1
Views: 2013
Reputation: 147146
I would create a new duplicate table as you describe, adding an auto_increment id field as a primary key:
CREATE TABLE table2
(id int auto_increment primary key,
`person_id` int,
`first` varchar(20),
`last` varchar(20),
`activity_date` date)
Create a unique key on this:
alter table table2 add unique key (`person_id`, `first`, `last`, `activity_date`);
And then insert all your data, using IGNORE
to skip all duplicate data:
insert ignore into table2 (`person_id`, `first`, `last`, `activity_date`)
select * from table1
When you get the CSV file from your vendor, use LOAD DATA INFILE
, again with the IGNORE
option, and that will only import data from the CSV file that does not exist in the database.
Upvotes: 1