djm
djm

Reputation: 342

Use XSLT to convert delimited text to XML

I have some double pipe delimited data inside some XML tags and I would like to replace/convert the delimited text to XML.
The delimited text also uses a colon to separate the heading and the data, like so: ||tagname:data||
The headings or tag names could be anything, this is just one example. So I don't know in advance what I'm getting. I must take what's listed in front of the colon and use that.

 <doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname       ||phone:3282||email:[email protected]||officenumber:D-107A||vcard:https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b||photo:https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||pronunciation:https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||  </str>
    </arr>
</doc>  

Can I use XSLT to transform this XML into this?

 <doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname  
          <phone>3282</phone>
          <email>[email protected]</email>
          <officenumber>D-107A</officenumber>
          <vcard>https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b</vcard>
          <photo>https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</photo>
          <pronunciation>https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</pronunciation>
      </str>
    </arr>
</doc>  

The URLs will have to be wrapped in CDATA and the delimited version will have to be replaced.
Can someone point me in the right direction? Thank you,

Upvotes: 0

Views: 557

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167571

analyze-string can help, with Saxon 9.5 the stylesheet

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes"/>

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="str">
  <xsl:copy>
    <xsl:analyze-string select="." regex="\|((\|[^|]+\|)+)\|">
      <xsl:matching-substring>
        <xsl:analyze-string select="regex-group(1)" regex="\|(\w+):([^|]+)\|">
          <xsl:matching-substring>
            <xsl:element name="{regex-group(1)}">
              <xsl:value-of select="regex-group(2)"/>
            </xsl:element>
          </xsl:matching-substring>
        </xsl:analyze-string>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

transforms the input

<doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname       ||phone:3282||email:[email protected]||officenumber:D-107A||vcard:https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b||photo:https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||pronunciation:https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||  </str>
    </arr>
</doc>

into the result

<doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname       <phone>3282</phone>
         <email>[email protected]</email>
         <officenumber>D-107A</officenumber>
         <vcard>https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b</vcard>
         <photo>https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</photo>
         <pronunciation>https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</pronunciation>  
      </str>
    </arr>
</doc>

Upvotes: 1

Related Questions