Alternative to MySQL LOAD LOCAL INFILE REPLACE to update a table

Question

If I have this file:

1  someval1
2  someval2
...

and this table in MySQL

-------------
|id | value |
|-----------|
| 1 | NULL  |
| 2 | NULL  | 
...

and my goal is to set the value in the right column of my table to the value in the right column of my csv file wherever the id's match. I can use this query (but I don't want to)

LOAD DATA 
  LOCAL INFILE 'file'
  REPLACE
  INTO TABLE myTable
  FIELDS TERMINATED BY ' '
  LINES TERMINATED BY '
'
  (id, value);

The reason I don't want to use REPLACE is because according the MySQL docs, this deletes the entire row whenever there is a match on the primary key (in this case, 'id') and reinserts a row with the information in my file. I'm assuming this means that at any point in the future if I were to add a column to this table, I'd have to make sure that whatever file I was using with this query contained ALL of the column values for every row that might get updated; otherwise non-targeted columns would be deleted.

Isn't there a better alternative to updating a table using a local file where one (or maybe more?) columns in the input file contain primary keys that need to be matched?

user1864610 · Accepted Answer

One possible approach:

Use LOAD DATA INFILE to load your CSV data into a temporary table.

Then, use INSERT ...SELECT ...ON DUPLICATE KEY UPDATE to perform your update.

Edit, with a worked & tested example:

The problem with LOAD DATA INFILE, as identified by the OP, is that LOAD DATA INFILE is optimised for speed and is pretty dumb when it comes to handling collisions in incoming data.

It has just two options: REPLACE or IGNORE. REPLACE is a real sledgehammer of a solution. When a duplicate key is found it simply deletes the original row and inserts a new one. The problem here is that any data in an existing row that is not contained in the CSV file is lost.

The alternative is 'IGNORE', but that fares little better. It doesn't affect the existing data, but when it's done there's no way to tell which lines in the CSV file have been applied, and which have not.

LOAD DATA INFILE is great for what it's best at: rapid bulk uploading of new data. For updating of existing data it may have its uses, but it's just not the tool.

We could just write a routine in our favourite language (or the language that we use, even if it's not our favourite - PHP anyone?), parse the CSV file and apply the updates with an INSERT...ON DUPLICATE UPDATE... query, but that introduces an additional level of scripting and denies MySQL an opportunity to optimise what it's doing. In any case, large number of individual INSERT queries are far from efficient. We could use the INSERT syntax that allows multiple value sets, but for large imports this potentially runs up against the maximum request size limit.

The solution is to use a two stage approach and allow MySQL do what it does best.

Import the CSV file into a temporary table with LOAD DATA INFILE, then use an INSERT...SELECT...ON DUPLICATE KEY UPDATE to handle the update process in a more intelligent way.

Suppose we have a table like this:

CREATE TABLE `test`.`bestmate` (
  `id` INT NOT NULL,
  `name` VARCHAR(45) NULL,
  `age` INT NULL,
  `bestmate` VARCHAR(45) NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `id_UNIQUE` (`id` ASC));

We want to import CSV data on a regular basis from some other source, like this:

id, name, age
1, fred, 16
2, alan, 22
3, george, 26

Our bestmate column is updated by our users on line, so we don't have that information in our CSV file, and we don't want to lose it.

First up, create a temporary table to hold our CSV data and import the file.

drop table if exists `csvTable` ;

CREATE TEMPORARY TABLE `csvTable`
  (`id` INT NOT NULL,
  `name` VARCHAR(45) NULL,
  `age` INT NULL);

LOAD DATA LOCAL INFILE 'c:\users.csv' INTO TABLE `csvTable`
  FIELDS TERMINATED BY ','
   OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '
'
  IGNORE 1 lines;

Now we can insert or update the rows in our bestmate table in one command, leaving any extra data in the table untouched.

INSERT `bestmate` (`id`,`name`,`age`) SELECT * from `csvTable` 
  ON DUPLICATE KEY UPDATE
    `bestmate`.`name` = `csvTable`.`name`,
    `bestmate`.`age` = `csvTable`.`age`;

Finally, tidy up after ourselves.

drop table if exists `csvTable` ;

Alternative to MySQL LOAD LOCAL INFILE REPLACE to update a table

Answers (2)

Related Questions