Reputation: 11433
MySQL has a nice statement: LOAD XML LOCAL INFILE
For example, if you have this table:
CREATE TABLE person (
person_id INT NOT NULL PRIMARY KEY,
fname VARCHAR(40) NULL,
lname VARCHAR(40) NULL
);
and the following XML file called person.xml:
<list>
<person>
<person_id>1</person_id>
<fname>Mikael</fname>
<lname>Ronström</lname>
</person>
<person>
<person_id>2</person_id>
<fname>Lars</fname>
<lname>Thalmann</lname>
</person>
</list>
You can do this:
LOAD XML LOCAL INFILE 'person.xml'
INTO TABLE person
ROWS IDENTIFIED BY '<person>';
My question is, what if the column names were different in the XML file than they are in the table? For example:
<list>
<person>
<PersonId>1</PersonId>
<FirstName>Mikael</FirstName>
<LastName>Ronström</LastName>
</person>
<person>
<PersonId>2</PersonId>
<FirstName>Lars</FirstName>
<LastName>Thalmann</LastName>
</person>
</list>
How can you accomplish the same thing with a MySQL statement without manipulating the XML file? I searched everywhere but couldn't find an answer.
Upvotes: 8
Views: 13222
Reputation: 323
mysql table schema: organization_type(id, name)
organizationtype.xml:
<NewDataSet>
<row>
<ItemID>1</ItemID>
<ItemCreatedBy>53</ItemCreatedBy>
<ItemCreatedWhen>2014-03-10T22:53:43.947+10:00</ItemCreatedWhen>
<ItemModifiedBy>53</ItemModifiedBy>
<ItemModifiedWhen>2014-03-10T22:53:43.99+10:00</ItemModifiedWhen>
<ItemOrder>1</ItemOrder>
<ItemGUID>e2ad051f-b7ea-4feb-b91e-f558f6f632a0</ItemGUID>
<Name>Company Type 1</Name>
</row>
and the mysql import query will look like this:
LOAD XML INFILE '/var/lib/mysql-files/organizationtype.xml'
INTO TABLE organization_type (@ItemID, @Name)
SET id=@ItemID, name=@Name
Upvotes: 2
Reputation: 11433
The following were the options available to me:
Option 1: Create a temporary table with different field names (as suggested by the other answers). This would have been a satisfactory approach. However, when I tried it, a new problem emerged: the LOAD XML statement does not, for some reason, accept minimized format empty elements (for example <person />
). So, the statement failed because the XML files I need to load occasionally have empty elements in that format.
Option 2: Transform the XML file with XSLT before running the LOAD XML statement to change the element names and modify the empty element formats. This was not feasible because the XML files are very large and XSLT processing engines load the entire XML into memory before processing.
Option 3: Bypass the LOAD XML statement entirely and use a SAX parser to parse the XML file and insert the records directly into the database using JDBC and prepared statements. Even though raw JDBC and prepared statements are generally efficient, this proved to be too slow. MUCH slower than the LOAD XML statement.
Option 4: Use the LOAD DATA statement instead of the LOAD XML statement and play around with the optional clauses associated with that statement to fit my needs (e.g. lines separated by, etc.). This could have worked but would have been error prone and unstable.
Option 5: Parse the file with a fast forward-only parser and read/write XML elements simultaneously and generate a new XML file with the modified names in the desired format for the LOAD XML statement.
I ended up using option 5. I used the Java Streaming API for XML (StAX) for both reading the XML file and generating the modified XML file and then running the LOAD XML LOCAL INFILE through JDBC from inside the web application. It works perfectly and it is super fast.
Upvotes: 4
Reputation: 32094
A little bit hacky but working solution using the good old LOAD DATA INFILE:
LOAD DATA LOCAL INFILE '/tmp/xml/loaded.xml'
INTO TABLE person
CHARACTER SET binary
LINES STARTING BY '<person>' TERMINATED BY '</person>'
(@person)
SET
person_id = ExtractValue(@person:=CONVERT(@person using utf8), 'PersonId'),
fname = ExtractValue(@person, 'FirstName'),
lname = ExtractValue(@person, 'LastName')
;
P.S. You will probably need to additionaly play with field delimiter if the data contains commas.
Upvotes: 4
Reputation: 562358
The fields in the XML file that don't correspond to physical column names are ignored. And columns in the table that don't have corresponding fields in the XML are set NULL.
What I'd do is load into a temp table as @Kolink suggests but with additional columns. Add a SET
clause as you load the data from XML.
CREATE TEMP TABLE person_xml LIKE person;
ALTER TABLE person_xml
ADD COLUMN FirstName VARCHAR(40),
ADD COLUMN LastName VARCHAR(40),
ADD COLUMN PersonId INT;
LOAD XML LOCAL INFILE 'person.xml' INTO TABLE person_xml
SET person_id = PersonId, fname = FirstName, lname = LastName;
SELECT * FROM person_xml;
+-----------+--------+-------------+-----------+-------------+----------+
| person_id | fname | lname | FirstName | LastName | PersonId |
+-----------+--------+-------------+-----------+-------------+----------+
| 1 | Mikael | Ronström | Mikael | Ronström | 1 |
| 2 | Lars | Thalmann | Lars | Thalmann | 2 |
+-----------+--------+-------------+-----------+-------------+----------+
Then copy to the real table, selecting a subset of columns.
INSERT INTO person SELECT person_id, fname, lname FROM person_xml;
Alternatively, drop the extra columns and use SELECT *
.
ALTER TABLE person_xml
DROP COLUMN PersonId,
DROP COLUMN FirstName,
DROP COLUMN LastName;
INSERT INTO person SELECT * FROM person_xml;
SELECT * FROM person;
+-----------+--------+-------------+
| person_id | fname | lname |
+-----------+--------+-------------+
| 1 | Mikael | Ronström |
| 2 | Lars | Thalmann |
+-----------+--------+-------------+
Upvotes: 12
Reputation: 324650
You could create a temporary table using the column names from the XML file (although this would have to be done manually in the create temporary table
query), load the XML file into that table, then insert into person select * from tmp_table_name
.
Upvotes: 2