Reputation: 2815
I'm looking do some sort of transform from INI to XML, the INI syntax is simple. I'm not looking to sed/awk/grep, this really should be done in XML tools.
Can this be done with regular XSL? I have heard of Xflat, but can I do that from tools compiled in C? Such as xsltproc or xmlstarlet.
Generic INI syntax is like this...
[section]
option = values
which would be in xml like this...
<section>
<option>values</option>
</section>
Any help would be very appreciated.
Upvotes: 1
Views: 3549
Reputation: 66714
Yes, you can parse a plain-text file in XSLT
It would probably be easier to do so in XSLT 2.0, if that is an option for you.
In XSLT 2.0,: you can use the unparsed-text()
function to read the file, tokenize()
to split it into lines.
<xsl:for-each select="tokenize(unparsed-text($in), '\r?\n')">
...
</xsl:for-each>
In XSLT 1.0: you can read many flat text files by incorporating it into an XML file by referencing the text file with an external entity (as long as they do not contain any characters/patterns that would result in XML parsing errors). The text from the file will be included in the XML file as it is parsed.
<!DOCTYPE foo [
<!ENTITY bar SYSTEM "bar.txt">
]>
<foo>
&bar;
</foo>
Upvotes: 2
Reputation: 243469
Can this be done with regular XSL?
Yes, and XSLT 2.0 provides more facilities than XSLT 1.0 for processing text. Very complex text processing has been implemented in XSLT, including a general LR(1) parser, used for building parsers for specific grammars, such as JSON and XPath.
In particular, learn about unparsed-text()
, the various string functions, including the ones that allow using regular expressions (matches()
, tokenize()
and replace()
) and also the <xsl:analyze-string>
instruction.
XSLT 1.0 also has string functions (as provided by XPath 1.0), however it lacks the regular expressions capabilty/functions and there is nothing such as the XSLT 2.0 function unparsed-text()
. Among the most useful XPath 1.0 string functions are: substring()
, substring-before()
, substring-after()
, starts-with()
, string-length()
, concat()
, and especially the translate()
function.
One can "read" a file by using an entity in a DTD, as Mads Hansen has explained in his answer. Another way is to read the file in the program that initiates the transformation, then to pass the file's content as a string parameter to the transformation.
Update: The OP has now provided specific data, so that a complete solution is possible:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vText" select=
"unparsed-text('file:///c:/temp/delete/test.ini')"/>
<xsl:variable name="vLines" as="xs:string*" select=
"tokenize($vText, '
?
')[.]"/>
<xsl:variable name="vLineCnt" select="count($vLines)"/>
<xsl:variable name="vSectLinesInds" as="xs:integer*" select=
"for $i in 1 to $vLineCnt
return
if(starts-with(normalize-space($vLines[$i]), '['))
then $i
else ()
"/>
<xsl:variable name="vSectCnt" select="count($vSectLinesInds)"/>
<xsl:template match="/">
<xsl:for-each select="$vSectLinesInds">
<xsl:variable name="vPos" select="position()"/>
<xsl:variable name="vInd" as="xs:integer" select="."/>
<xsl:variable name="vthisLine" as="xs:string"
select="$vLines[$vInd]"/>
<xsl:variable name="vNextSectInd" select=
"if($vPos eq $vSectCnt)
then
$vLineCnt +1
else
$vSectLinesInds[$vPos +1]
"/>
<xsl:variable name="vInnerLines" select=
"$vLines
[position() gt current()
and
position() lt $vNextSectInd
]
"/>
<xsl:variable name="vName" select=
"tokenize($vthisLine, '\[|\]')[2]"/>
<xsl:element name="{$vName}">
<xsl:for-each select="$vInnerLines">
<xsl:variable name="vInnerParts" select=
"tokenize(., '[ ]*=[ ]*')"/>
<xsl:element name="{$vInnerParts[1]}">
<xsl:value-of select="$vInnerParts[2]"/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on any XML document (not used) and if the file at C:\temp\delete\test.ini
has the following content:
[section1]
option1 = values1
option2 = values2
option3 = values3
option4 = values4
option5 = values5
[section2]
option1 = values1
option2 = values2
option3 = values3
option4 = values4
option5 = values5
[section3]
option1 = values1
option2 = values2
option3 = values3
option4 = values4
option5 = values5
the wanted, correct result is produced:
<section1>
<option1>values1</option1>
<option2>values2</option2>
<option3>values3</option3>
<option4>values4</option4>
<option5>values5</option5>
</section1>
<section2>
<option1>values1</option1>
<option2>values2</option2>
<option3>values3</option3>
<option4>values4</option4>
<option5>values5</option5>
</section2>
<section3>
<option1>values1</option1>
<option2>values2</option2>
<option3>values3</option3>
<option4>values4</option4>
<option5>values5</option5>
</section3>
Upvotes: 4
Reputation: 2998
If it's possible for you to use an XSLT 2.0 processor, you've got the unparsed-text()
function that can import flat files.
Once the file is imported, you have traditional string tools in XPath 2.0 to handle your data (regex, translate...), see : http://www.w3.org/TR/xpath-functions/#string-functions.
Upvotes: 1