J. M. Becker
J. M. Becker

Reputation: 2815

Transform INI to XML? OR any generic legacy flat-file? XSL? from xmlstarlet or xsltproc?

I'm looking do some sort of transform from INI to XML, the INI syntax is simple. I'm not looking to sed/awk/grep, this really should be done in XML tools.

Can this be done with regular XSL? I have heard of Xflat, but can I do that from tools compiled in C? Such as xsltproc or xmlstarlet.

Generic INI syntax is like this...

[section]
option = values

which would be in xml like this...

<section>
<option>values</option>
</section>

Any help would be very appreciated.

Upvotes: 1

Views: 3549

Answers (3)

Mads Hansen
Mads Hansen

Reputation: 66714

Yes, you can parse a plain-text file in XSLT

It would probably be easier to do so in XSLT 2.0, if that is an option for you.

In XSLT 2.0,: you can use the unparsed-text() function to read the file, tokenize() to split it into lines.

<xsl:for-each select="tokenize(unparsed-text($in), '\r?\n')">
 ...
</xsl:for-each>

In XSLT 1.0: you can read many flat text files by incorporating it into an XML file by referencing the text file with an external entity (as long as they do not contain any characters/patterns that would result in XML parsing errors). The text from the file will be included in the XML file as it is parsed.

<!DOCTYPE foo [
<!ENTITY bar SYSTEM "bar.txt">
]>
<foo>
&bar;
</foo>

Upvotes: 2

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243469

Can this be done with regular XSL?

Yes, and XSLT 2.0 provides more facilities than XSLT 1.0 for processing text. Very complex text processing has been implemented in XSLT, including a general LR(1) parser, used for building parsers for specific grammars, such as JSON and XPath.

In particular, learn about unparsed-text(), the various string functions, including the ones that allow using regular expressions (matches(), tokenize() and replace()) and also the <xsl:analyze-string> instruction.

XSLT 1.0 also has string functions (as provided by XPath 1.0), however it lacks the regular expressions capabilty/functions and there is nothing such as the XSLT 2.0 function unparsed-text(). Among the most useful XPath 1.0 string functions are: substring(), substring-before(), substring-after(), starts-with(), string-length(), concat(), and especially the translate() function.

One can "read" a file by using an entity in a DTD, as Mads Hansen has explained in his answer. Another way is to read the file in the program that initiates the transformation, then to pass the file's content as a string parameter to the transformation.

Update: The OP has now provided specific data, so that a complete solution is possible:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vText" select=
 "unparsed-text('file:///c:/temp/delete/test.ini')"/>

 <xsl:variable name="vLines" as="xs:string*" select=
   "tokenize($vText, '&#xD;?&#xA;')[.]"/>

 <xsl:variable name="vLineCnt" select="count($vLines)"/>

 <xsl:variable name="vSectLinesInds" as="xs:integer*" select=
  "for $i in 1 to $vLineCnt
     return
       if(starts-with(normalize-space($vLines[$i]), '['))
         then $i
         else ()
  "/>

 <xsl:variable name="vSectCnt" select="count($vSectLinesInds)"/>

 <xsl:template match="/">
  <xsl:for-each select="$vSectLinesInds">
    <xsl:variable name="vPos" select="position()"/>
    <xsl:variable name="vInd" as="xs:integer" select="."/>

     <xsl:variable name="vthisLine" as="xs:string"
          select="$vLines[$vInd]"/>

    <xsl:variable name="vNextSectInd" select=
     "if($vPos eq $vSectCnt)
        then
          $vLineCnt +1
        else
          $vSectLinesInds[$vPos +1]
     "/>

   <xsl:variable name="vInnerLines" select=
   "$vLines
       [position() gt current()
      and
        position() lt $vNextSectInd
       ]

   "/>

   <xsl:variable name="vName" select=
    "tokenize($vthisLine, '\[|\]')[2]"/>

   <xsl:element name="{$vName}">
    <xsl:for-each select="$vInnerLines">
      <xsl:variable name="vInnerParts" select=
      "tokenize(., '[ ]*=[ ]*')"/>

      <xsl:element name="{$vInnerParts[1]}">
        <xsl:value-of select="$vInnerParts[2]"/>
      </xsl:element>
    </xsl:for-each>
  </xsl:element>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on any XML document (not used) and if the file at C:\temp\delete\test.ini has the following content:

[section1]
option1 = values1
option2 = values2
option3 = values3
option4 = values4
option5 = values5

[section2]
option1 = values1
option2 = values2
option3 = values3
option4 = values4
option5 = values5

[section3]
option1 = values1
option2 = values2
option3 = values3
option4 = values4
option5 = values5

the wanted, correct result is produced:

<section1>
   <option1>values1</option1>
   <option2>values2</option2>
   <option3>values3</option3>
   <option4>values4</option4>
   <option5>values5</option5>
</section1>
<section2>
   <option1>values1</option1>
   <option2>values2</option2>
   <option3>values3</option3>
   <option4>values4</option4>
   <option5>values5</option5>
</section2>
<section3>
   <option1>values1</option1>
   <option2>values2</option2>
   <option3>values3</option3>
   <option4>values4</option4>
   <option5>values5</option5>
</section3>

Upvotes: 4

Vincent Biragnet
Vincent Biragnet

Reputation: 2998

If it's possible for you to use an XSLT 2.0 processor, you've got the unparsed-text() function that can import flat files.

Once the file is imported, you have traditional string tools in XPath 2.0 to handle your data (regex, translate...), see : http://www.w3.org/TR/xpath-functions/#string-functions.

Upvotes: 1

Related Questions