stepanian
stepanian

Reputation: 11433

LOAD XML LOCAL INFILE with Inconsistent Column Names

MySQL has a nice statement: LOAD XML LOCAL INFILE

For example, if you have this table:

 CREATE TABLE person (
    person_id INT NOT NULL PRIMARY KEY,
    fname VARCHAR(40) NULL,
    lname VARCHAR(40) NULL
 );

and the following XML file called person.xml:

<list>
      <person>
          <person_id>1</person_id>
          <fname>Mikael</fname>
          <lname>Ronström</lname>
      </person>
      <person>
          <person_id>2</person_id>
          <fname>Lars</fname>
          <lname>Thalmann</lname>
      </person>
</list>

You can do this:

LOAD XML LOCAL INFILE 'person.xml'
INTO TABLE person
ROWS IDENTIFIED BY '<person>';

My question is, what if the column names were different in the XML file than they are in the table? For example:

<list>
      <person>
          <PersonId>1</PersonId>
          <FirstName>Mikael</FirstName>
          <LastName>Ronström</LastName>
      </person>
      <person>
          <PersonId>2</PersonId>
          <FirstName>Lars</FirstName>
          <LastName>Thalmann</LastName>
      </person>
</list>

How can you accomplish the same thing with a MySQL statement without manipulating the XML file? I searched everywhere but couldn't find an answer.

Upvotes: 8

Views: 13222

Answers (5)

Gustik
Gustik

Reputation: 323

mysql table schema: organization_type(id, name)

organizationtype.xml:

<NewDataSet>
    <row>
      <ItemID>1</ItemID>
      <ItemCreatedBy>53</ItemCreatedBy>
      <ItemCreatedWhen>2014-03-10T22:53:43.947+10:00</ItemCreatedWhen>
      <ItemModifiedBy>53</ItemModifiedBy>
      <ItemModifiedWhen>2014-03-10T22:53:43.99+10:00</ItemModifiedWhen>
      <ItemOrder>1</ItemOrder>
      <ItemGUID>e2ad051f-b7ea-4feb-b91e-f558f6f632a0</ItemGUID>
      <Name>Company Type 1</Name>
    </row>

and the mysql import query will look like this:

LOAD XML INFILE '/var/lib/mysql-files/organizationtype.xml'
INTO TABLE organization_type (@ItemID, @Name) 
SET id=@ItemID, name=@Name

Upvotes: 2

stepanian
stepanian

Reputation: 11433

The following were the options available to me:

Option 1: Create a temporary table with different field names (as suggested by the other answers). This would have been a satisfactory approach. However, when I tried it, a new problem emerged: the LOAD XML statement does not, for some reason, accept minimized format empty elements (for example <person />). So, the statement failed because the XML files I need to load occasionally have empty elements in that format.

Option 2: Transform the XML file with XSLT before running the LOAD XML statement to change the element names and modify the empty element formats. This was not feasible because the XML files are very large and XSLT processing engines load the entire XML into memory before processing.

Option 3: Bypass the LOAD XML statement entirely and use a SAX parser to parse the XML file and insert the records directly into the database using JDBC and prepared statements. Even though raw JDBC and prepared statements are generally efficient, this proved to be too slow. MUCH slower than the LOAD XML statement.

Option 4: Use the LOAD DATA statement instead of the LOAD XML statement and play around with the optional clauses associated with that statement to fit my needs (e.g. lines separated by, etc.). This could have worked but would have been error prone and unstable.

Option 5: Parse the file with a fast forward-only parser and read/write XML elements simultaneously and generate a new XML file with the modified names in the desired format for the LOAD XML statement.

I ended up using option 5. I used the Java Streaming API for XML (StAX) for both reading the XML file and generating the modified XML file and then running the LOAD XML LOCAL INFILE through JDBC from inside the web application. It works perfectly and it is super fast.

Upvotes: 4

newtover
newtover

Reputation: 32094

A little bit hacky but working solution using the good old LOAD DATA INFILE:

LOAD DATA LOCAL INFILE '/tmp/xml/loaded.xml'
INTO TABLE person
CHARACTER SET binary
LINES STARTING BY '<person>' TERMINATED BY '</person>'
(@person)
SET
  person_id = ExtractValue(@person:=CONVERT(@person using utf8), 'PersonId'),
  fname = ExtractValue(@person, 'FirstName'),
  lname = ExtractValue(@person, 'LastName')
;

P.S. You will probably need to additionaly play with field delimiter if the data contains commas.

Upvotes: 4

Bill Karwin
Bill Karwin

Reputation: 562358

The fields in the XML file that don't correspond to physical column names are ignored. And columns in the table that don't have corresponding fields in the XML are set NULL.

What I'd do is load into a temp table as @Kolink suggests but with additional columns. Add a SET clause as you load the data from XML.

CREATE TEMP TABLE person_xml LIKE person;

ALTER TABLE person_xml 
  ADD COLUMN FirstName VARCHAR(40),
  ADD COLUMN LastName  VARCHAR(40),
  ADD COLUMN PersonId  INT;

LOAD XML LOCAL INFILE 'person.xml' INTO TABLE person_xml
  SET person_id = PersonId, fname = FirstName, lname = LastName;

SELECT * FROM person_xml;
+-----------+--------+-------------+-----------+-------------+----------+
| person_id | fname  | lname       | FirstName | LastName    | PersonId |
+-----------+--------+-------------+-----------+-------------+----------+
|         1 | Mikael | Ronström    | Mikael    | Ronström    |        1 |
|         2 | Lars   | Thalmann    | Lars      | Thalmann    |        2 |
+-----------+--------+-------------+-----------+-------------+----------+

Then copy to the real table, selecting a subset of columns.

INSERT INTO person SELECT person_id, fname, lname FROM person_xml;

Alternatively, drop the extra columns and use SELECT *.

ALTER TABLE person_xml 
  DROP COLUMN PersonId, 
  DROP COLUMN FirstName, 
  DROP COLUMN LastName;

INSERT INTO person SELECT * FROM person_xml;

SELECT * FROM person;

+-----------+--------+-------------+
| person_id | fname  | lname       |
+-----------+--------+-------------+
|         1 | Mikael | Ronström    |
|         2 | Lars   | Thalmann    |
+-----------+--------+-------------+

Upvotes: 12

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324650

You could create a temporary table using the column names from the XML file (although this would have to be done manually in the create temporary table query), load the XML file into that table, then insert into person select * from tmp_table_name.

Upvotes: 2

Related Questions