Roman
Roman

Reputation: 287

How to remove line breaks in xml text nodes

Let's say I've very badly formatted XML like:

<?xml version="1.0" encoding="ISO-8859-1"?>
<tag1>
  <tag2>
       Some text
  </tag2>
  <tag3>
       Some other text
  </tag3>
</tag1>

I've to transform it to form like:

<?xml version="1.0" encoding="ISO-8859-1"?>
<tag1>
  <tag2>roman</tag2>
  <tag3>Some other text</tag3>
</tag1>

I've tried to use XSLT transformation like this (also in many variants):

<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)" />
</xsl:template>
<xsl:template match="node()|@*">
  <xsl:copy><xsl:apply-templates select="node()|@*"/></xsl:copy>
</xsl:template>
</xsl:stylesheet>

I've used this online tool to verify but I always fail...

Could you help me what's the easiest way to do this?

Upvotes: 0

Views: 858

Answers (2)

michael.hor257k
michael.hor257k

Reputation: 116959

The reason your attempt fails is that you have two templates matching text nodes:

  • your first template matches only text nodes;
  • your second templates matches nodes of any type - including text nodes.

Most processors resolve such conflict by applying the last matching template in the stylesheet (although the specification permits signaling an error instead).

The simple solution is to change the order of the templates and hope that the processor will choose to recover from the error.

Alternatively you could raise the priority of the first template above -0.5 (the priority of the identity transform template), e.g.:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

<xsl:template match="text()" priority="0">
    <xsl:value-of select="normalize-space(.)" />
</xsl:template>

<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

To get the output shown in your question, you will also want to add:

<xsl:output indent="yes"/>

at the top level of your stylesheet.

Upvotes: 2

Sebastien
Sebastien

Reputation: 2714

I don't know which XSLT engine you are using, but this works for me:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="text()">
        <xsl:value-of select="normalize-space()"/>
    </xsl:template>
    
</xsl:stylesheet>

Produces:

<?xml version="1.0"?><tag1><tag2>Some text</tag2><tag3>Some other text</tag3></tag1>

See it working here: https://xsltfiddle.liberty-development.net/ehW12ga

Upvotes: 1

Related Questions