Reputation: 23
I want to generate an SQL INSERT statement from data encoded in XML files, using an XSLT.
My XML files include, for example, the following tags (describing a tombstone):
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Funerary inscription for Marcellus, a smith</title>
</titleStmt>
<publicationStmt>
<authority>I.Sicily</authority>
<idno type="TM">491539</idno>
</publicationStmt>
<sourceDesc>
<msDesc>
<msIdentifier>
<country>Italy</country>
<region>Sicily</region>
<settlement>Catania</settlement>
<repository role="museum" ref="http://sicily.classics.ox.ac.uk/museum/018"
>Museo Civico di Catania </repository>
<idno type="inventory">390</idno>
<altIdentifier>
<settlement/>
<repository/>
<idno type="old"/>
</altIdentifier>
</msIdentifier>
<msContents>
<textLang mainLang="la">Latin </textLang>
</msContents>
<physDesc>
<objectDesc>
<supportDesc>
<support>
<material n="marble"
ref="http://www.eagle-network.eu/voc/material/lod/48.html"
>marble </material>
<objectType n="tabula"
ref="http://www.eagle-network.eu/voc/objtyp/lod/257">tablet </objectType>
<dimensions>
<height unit="cm">29</height>
<width unit="cm">33.5</width>
<depth unit="cm">2.1</depth>
</dimensions>
</support>
</supportDesc>
</objectDesc>
</physDesc>
</msDesc>
</sourceDesc>
</fileDesc>
</teiHeader>
<!-- lots more content that I cut away here -->
</TEI>
I would like to extract information from this and insert it into an SQL table.
My desired output would look like this:
INSERT INTO tblObjects
(ObjectID, Title, TMid, Material, ObjectType, Height)
VALUES
('Funerary inscription for Marcellus, a smith', 491539, 'marble', 'tabula', 29);
My original - now solved - problem was this:
I tried setting the output method to text after some examples I found online, but this is giving me the error message "Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'."
I changed the file format to xsl, now it's no longer complaining about the non-whitespace characters. I can now put text into the final output. The following (inspired by your answers) is the current state of my XSLT, for now I want to try to just insert the value of the title, the position of which is "TEI/teiHeader/fileDesc/titleStmt/title":
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"
encoding="UTF-8"
omit-xml-declaration="yes"
indent="no"/>
<xsl:template match="/">
<xsl:text>INSERT INTO tblObjects</xsl:text>
<xsl:text>(ObjectID, Title, TMid, Material, ObjectType, Height) VALUES (</xsl:text>
<xsl:apply-templates select="root"/>
<xsl:text>);</xsl:text>
</xsl:template>
<xsl:template match="root">
<xsl:value-of select="TEI/teiHeader/fileDesc/titleStmt/title"/>
</xsl:template>
</xsl:stylesheet>
This gives me the following output:
INSERT INTO tblObjects(ObjectID, Title, TMid, Material, ObjectType, Height) VALUES ();
Yet, as you can see, it does not insert the value of the title. I am not sure why that's not working (only trying the title for now).
Upvotes: 2
Views: 1523
Reputation: 107687
Simply walk down the tree with templates and concatenate values with needed quotes and commas for SQL query's VALUES
clause. Due to the many nested structure of XML, ancestor::*
and descendant::*
paths are used for specific node value extraction.
Note: This solution works for XML files for one teiHeader. You will need to tailor this solution or run other XSLT scripts for other types.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://www.tei-c.org/ns/1.0">
<xsl:output method="text" encoding="UTF-8" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/doc:TEI">
<xsl:text>INSERT INTO tblObjects</xsl:text>
<xsl:text>(ObjectID, Title, TMid, Material, ObjectType, Height)
VALUES
</xsl:text>
<xsl:apply-templates select="doc:teiHeader/doc:fileDesc/doc:sourceDesc/doc:msDesc/doc:physDesc"/>
</xsl:template>
<xsl:template match="doc:physDesc">
<xsl:variable name="quote">'</xsl:variable>
<xsl:value-of select="concat('(',
$quote, ancestor::doc:fileDesc/doc:titleStmt/doc:title, $quote, ', ',
ancestor::doc:fileDesc/doc:publicationStmt/doc:idno[@type='TM'], ', ',
$quote, normalize-space(descendant::doc:material), $quote, ', ',
$quote, normalize-space(descendant::doc:objectType), $quote, ', ',
descendant::doc:dimensions/doc:height,
')'
)"/>
</xsl:template>
</xsl:stylesheet>
Upvotes: 1