Reputation: 955
I am converting an XML file to another XML file using XSLT. XML file has element like this:
<Course>
<ID>1001</ID>
<Seats>10</Seats>
<Description>Department: CS , Faculty: XYZ</Description>
</Course>
Now is there any way in XSLT by which when I generate new XML file that looks like this:
<Course>
<ID>1001</ID>
<Seats>10</Seats>
<Department> CS </Department>
<Faculty> XYZ</Faculty>
</Course>
That is I want to split the Description element into two different elements Department and Faculty which are basically its content separated by comma. I used XMLspy to wrote my XSLT.
Thank you in advance.
Upvotes: 1
Views: 2134
Reputation:
Here's one possible XSLT2 solution based on the identity transform and the tokenize string function in a template specific to the Description element.
The general idea is to first split the Description string on "," and then split each of the resulting substrings on ":", picking only the last part.
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Description">
<xsl:variable name="tokens" select="fn:tokenize(text(),',')"/>
<xsl:element name="Department"><xsl:value-of select="fn:normalize-space(fn:tokenize($tokens[1],':')[2])"/></xsl:element>
<xsl:element name="Faculty"><xsl:value-of select="fn:normalize-space(fn:tokenize($tokens[2],':')[2])"/></xsl:element>
</xsl:template>
</xsl:stylesheet>
The normalize-space function is invoked as a last step, to strip leading/trailing spaces; if this is not necessary, just leave that bit out.
Caveat emptor: The assumtion here being that for format of the Description text is fixed (i.e. Department and Faculty always present in same order.) Also, it is assumed that neither ":" nor "," occurs in the Description element text.
The transformation above yields the expected result:
<?xml version="1.0" encoding="UTF-8"?><Course>
<ID>1001</ID>
<Seats>10</Seats>
<Department>CS</Department>
<Faculty>XYZ</Faculty>
</Course>
Note that having structured information inside a run of plain text is not exactly making the best use of XML, which is all about structure, but I guess that format is not something you have control over.
Update based on comments:
A more robust alternative solution based on regular expression matching is listed below. In this case, only Description elements matching the Department, Faculty pattern are rewritten; otherwise the original Description element is passed through:
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Description">
<xsl:analyze-string select="." regex="\s*Department:\s*(.+)\s*,\s*Faculty:\s*(.+)\s*">
<xsl:matching-substring>
<xsl:element name="Department"><xsl:value-of select="regex-group(1)"/></xsl:element>
<xsl:element name="Faculty"><xsl:value-of select="regex-group(2)"/></xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:element name="Description"><xsl:copy/></xsl:element>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
The key idea here is to use xsl:analyze-string
to test via an XSLT regular expression if the expected pattern is found, and capture the respective values in that case. If no match is found, the original contents of the Description element are copied through.
Note: Integrating this with the root element is left as an exercise for the reader (since the OP example does not show where the Course elements fit).
Upvotes: 2