Reputation: 3403
If I have this file:
1 someval1
2 someval2
...
and this table in MySQL
-------------
|id | value |
|-----------|
| 1 | NULL |
| 2 | NULL |
...
and my goal is to set the value in the right column of my table to the value in the right column of my csv file wherever the id's match. I can use this query (but I don't want to)
LOAD DATA
LOCAL INFILE 'file'
REPLACE
INTO TABLE myTable
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'
(id, value);
The reason I don't want to use REPLACE is because according the MySQL docs, this deletes the entire row whenever there is a match on the primary key (in this case, 'id') and reinserts a row with the information in my file. I'm assuming this means that at any point in the future if I were to add a column to this table, I'd have to make sure that whatever file I was using with this query contained ALL of the column values for every row that might get updated; otherwise non-targeted columns would be deleted.
Isn't there a better alternative to updating a table using a local file where one (or maybe more?) columns in the input file contain primary keys that need to be matched?
Upvotes: 0
Views: 4810
Reputation:
One possible approach:
Use LOAD DATA INFILE
to load your CSV data into a temporary table.
Then, use INSERT ...SELECT ...ON DUPLICATE KEY UPDATE
to perform your update.
Edit, with a worked & tested example:
The problem with LOAD DATA INFILE
, as identified by the OP, is that LOAD DATA INFILE
is optimised for speed and is pretty dumb when it comes to handling collisions in incoming data.
It has just two options: REPLACE
or IGNORE
. REPLACE
is a real sledgehammer of a solution. When a duplicate key is found it simply deletes the original row and inserts a new one. The problem here is that any data in an existing row that is not contained in the CSV file is lost.
The alternative is 'IGNORE', but that fares little better. It doesn't affect the existing data, but when it's done there's no way to tell which lines in the CSV file have been applied, and which have not.
LOAD DATA INFILE
is great for what it's best at: rapid bulk uploading of new data. For updating of existing data it may have its uses, but it's just not the tool.
We could just write a routine in our favourite language (or the language that we use, even if it's not our favourite - PHP anyone?), parse the CSV file and apply the updates with an INSERT...ON DUPLICATE UPDATE...
query, but that introduces an additional level of scripting and denies MySQL an opportunity to optimise what it's doing. In any case, large number of individual INSERT
queries are far from efficient. We could use the INSERT
syntax that allows multiple value sets, but for large imports this potentially runs up against the maximum request size limit.
The solution is to use a two stage approach and allow MySQL do what it does best.
Import the CSV file into a temporary table with LOAD DATA INFILE
, then use an INSERT...SELECT...ON DUPLICATE KEY UPDATE
to handle the update process in a more intelligent way.
Suppose we have a table like this:
CREATE TABLE `test`.`bestmate` (
`id` INT NOT NULL,
`name` VARCHAR(45) NULL,
`age` INT NULL,
`bestmate` VARCHAR(45) NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `id_UNIQUE` (`id` ASC));
We want to import CSV data on a regular basis from some other source, like this:
id, name, age
1, fred, 16
2, alan, 22
3, george, 26
Our bestmate
column is updated by our users on line, so we don't have that information in our CSV file, and we don't want to lose it.
First up, create a temporary table to hold our CSV data and import the file.
drop table if exists `csvTable` ;
CREATE TEMPORARY TABLE `csvTable`
(`id` INT NOT NULL,
`name` VARCHAR(45) NULL,
`age` INT NULL);
LOAD DATA LOCAL INFILE 'c:\\users.csv' INTO TABLE `csvTable`
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 lines;
Now we can insert or update the rows in our bestmate
table in one command, leaving any extra data in the table untouched.
INSERT `bestmate` (`id`,`name`,`age`) SELECT * from `csvTable`
ON DUPLICATE KEY UPDATE
`bestmate`.`name` = `csvTable`.`name`,
`bestmate`.`age` = `csvTable`.`age`;
Finally, tidy up after ourselves.
drop table if exists `csvTable` ;
Upvotes: 3
Reputation: 2258
You can generate UPDATE statements out of file
and then execute them:
awk "{ print \"UPDATE TABLE myTable SET value = '\" \$2 \"' WHERE id ='\" \$1 \"';\" }" < file | mysql db
Upvotes: 1