Raghavendra Nilekani
Raghavendra Nilekani

Reputation: 406

XSLT to split text data into group of multiple lines

I am trying to write an XSLT code which splits the text data having multiple lines and produces an XML which contains group of multiple fixed number of lines from the text data.

For example, If my input XML is like this

<?xml version="1.0" encoding="UTF-8"?>
<csv>
    <data>Id,Name,Address,Location,Extid,contact
          1,raagu1,hosakote1,bangalore1,123,contact1
          2,raagu2,hosakote2,bangalore2,123,contact2
          3,raagu3,hosakote3,bangalore3,123,contact3
          4,raag4,hosakote4,bangalore4,123,contact4
          5,raagu5,hosakote5,bangalore5,123,contact5
          6,raagu6,hosakote6,bangalore6,123,contact6
          7,raagu7,hosakote7,bangalore7,123,contact7
    </data>
</csv>

where the text data under element data tells, the first line (Id,Name,Address,Location,Extid,contact) is header and rest of the lines are data corresponding to the header fields.

When I say fixed length for lines is 4 i,e. group of 4 data sets, then my output XML should be like this.

<?xml version="1.0" encoding="UTF-8"?>
<csv>
    <data>
        Id,Name,Address,Location,Extid,contact
        1,raagu1,hosakote1,bangalore1,123,contact1
        2,raagu2,hosakote2,bangalore2,123,contact2
        3,raagu3,hosakote3,bangalore3,123,contact3
        4,raag4,hosakote4,bangalore4,123,contact4
    </data>
    <data>
        Id,Name,Address,Location,Extid,contact
        5,raagu5,hosakote5,bangalore5,123,contact5
        6,raagu6,hosakote6,bangalore6,123,contact6
        7,raagu7,hosakote7,bangalore7,123,contact6
    </data>
</csv>

To achieve this I have explored on xslt scripts and tried following XSLT

 <xsl:stylesheet version = "2.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes" method="xml" encoding="UTF-8"/>

<xsl:template match = "/csv/data">

    <xsl:variable name="header" select="substring-before(.,'&#10;')"/>
    <xsl:variable name="data" select="substring-after(.,'&#10;')"/>

    <csv>

        <xsl:for-each select = "tokenize($data, '\n')">

            <xsl:variable name="count" select="position()"/>

            <data>
                <xsl:value-of select="$header"/>
                <xsl:text>&#10;</xsl:text>
                <xsl:sequence select = "."/>
            </data>

        </xsl:for-each>

    </csv>

</xsl:template>
</xsl:stylesheet>

With this, the output I got was

<?xml version="1.0" encoding="UTF-8"?>
<csv>
<data>
    Id,Name,Address,Location,Extid,contact
    1,raagu1,hosakote1,bangalore1,123,contact1
</data>
<data>
    Id,Name,Address,Location,Extid,contact
    2,raagu2,hosakote2,bangalore2,123,contact2
</data>
<data>
    Id,Name,Address,Location,Extid,contact
    3,raagu3,hosakote3,bangalore3,123,contact3
</data>
<data>
    Id,Name,Address,Location,Extid,contact
    4,raag4,hosakote4,bangalore4,123,contact4
</data>
<data>
    Id,Name,Address,Location,Extid,contact
    5,raagu5,hosakote5,bangalore5,123,contact5
</data>
<data>
    Id,Name,Address,Location,Extid,contact
    6,raagu6,hosakote6,bangalore6,123,contact6
</data>
<data>
    Id,Name,Address,Location,Extid,contact
    7,raagu7,hosakote7,bangalore7,123,contact7
</data>
</csv>

I could not quite get it right since for every line it is grouping. I think I missing some thing to do with concatenation. I am looking for some help to see whether are they any functions in xslt using which we can split the text into multiple groups lines and create a single element for each of those group with very good performance? I am ok for xslt 2.0 functions. The code should work even for 1,00,000+ data sets.

Thanks

Raagu

Upvotes: 1

Views: 4355

Answers (2)

Philipp
Philipp

Reputation: 4729

This is a basic solution (not group-adjacent) with manual element creation - not very beautiful, but works and is comprehensive.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes" method="xml" encoding="UTF-8"/>

    <xsl:template match="/csv/data">
        <xsl:variable name="header" select="substring-before(.,'&#10;')"/>
        <xsl:variable name="data" select="substring-after(.,'&#10;')"/>
        <xsl:variable name="numberOfRows" select="4"/>
        <csv>
            <xsl:for-each select="tokenize($data, '\n')">
                <xsl:variable name="count" select="position()-1"/>
                <xsl:variable name="modulo" select="$count mod $numberOfRows"/>
                <xsl:if test="$modulo = 0">
                    <xsl:text disable-output-escaping="yes">&lt;data></xsl:text>
                    <xsl:value-of select="$header"/>
                    <xsl:text>&#10;</xsl:text>
                </xsl:if>
                <xsl:sequence select="."/>
                <xsl:text>&#10;</xsl:text>

                <xsl:if test="$modulo = ($numberOfRows - 1)">
                    <xsl:text disable-output-escaping="yes">&lt;/data></xsl:text>
                </xsl:if>

            </xsl:for-each>
        </csv>
    </xsl:template>

</xsl:stylesheet>

Upvotes: 2

Martin Honnen
Martin Honnen

Reputation: 167571

Do you really want to create that XML result format that continues to have comma separated data and line separated data? I would consider to clean up the data and mark it up properly with XML.

But as for the grouping, here is an example:

<xsl:stylesheet version = "2.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:param name="chunk-size" select="4" as="xs:integer"/>

<xsl:output indent="yes" method="xml" encoding="UTF-8"/>

<xsl:template match = "/csv/data">

    <xsl:variable name="header" select="substring-before(.,'&#10;')"/>
    <xsl:variable name="data" select="substring-after(.,'&#10;')"/>

    <csv>

        <xsl:for-each-group select = "tokenize($data, '\n')" group-adjacent="(position() - 1) idiv $chunk-size">



            <data>
                <xsl:value-of select="$header"/>
                <xsl:text>&#10;</xsl:text>
                <xsl:value-of select = "current-group()" separator="&#10;"/>
            </data>

        </xsl:for-each-group>

    </csv>

</xsl:template>
</xsl:stylesheet>

Upvotes: 3

Related Questions