Yalmar
Yalmar

Reputation: 435

XSLT Select attribute value of node with another attribute with a given value

I have a large number of html files like the following 01.html file:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>My Title</title> 
  </head>
  <body>
    <item itemprop="itemprop1" content="content1" /> 
    <item itemprop="itemprop2" content="content2" /> 
    <item itemprop="itemprop3" content="content3" /> 
    <item itemprop="itemprop4" content="content4" />
    <item itemprop="itemprop5" content="content5" />
    <item itemprop="itemprop6" content="content6" />
    <item itemprop="itemprop7" content="content7" />
    <item itemprop="itemprop8" content="content8" />
    <item itemprop="itemprop9" content="content9" />
  </body>
</html>

There is only one item node with itemprop="itemprop1" in each html file. Same for itemprop2, itemprop3, etc.

I would like to have the following txt file output:

content1 | content 5

that is the concatenation of: 1. the value of the attribute content for the item with itemprop="itemprop1" 2. a pipe "|" 3. the value of the attribute content for the item with itemprop="itemprop5"

I run the following bash script:

xsltproc 01.xslt 01.html >> 02.txt

where 01.xslt is the following:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="body">
  <xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>|<xsl:value-of select="item[@itemprop='itemprop5']/@content"/>
 </xsl:template>

</xsl:stylesheet>

Unfortunately it doesn't work. What is the correct xslt file?

UPDATE

This is the final working example.

01.html is the following:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>My Title</title> 
  </head>
  <body>
    <item itemprop="itemprop1" content="content1" /> 
    <item itemprop="itemprop2" content="content2" /> 
    <item itemprop="itemprop3" content="content3" /> 
    <item itemprop="itemprop4" content="content4" />
    <item itemprop="itemprop5" content="content5" />
    <item itemprop="itemprop6" content="content6" />
    <item itemprop="itemprop7" content="content7" />
    <item itemprop="itemprop8" content="content8" />
    <item itemprop="itemprop9" content="content9" />
  </body>
</html>

01.xslt is the following:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes" method="text"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="html">
  <xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>
  <xsl:text>|</xsl:text>
  <xsl:value-of select="//item[@itemprop='itemprop5']/@content"/>
 </xsl:template>

</xsl:stylesheet>

and the output 02.txt is the following:

content1|content5

Upvotes: 2

Views: 8438

Answers (3)

imran
imran

Reputation: 461

<xsl:output method="text" indent="yes"/>
    <xsl:template match="/">
        <xsl:value-of select="html/body/item[@itemprop='itemprop1']/@content"/>|<xsl:value-of select="html/body/item[@itemprop='itemprop5']/@content"/>
    </xsl:template>

Upvotes: 0

zx485
zx485

Reputation: 29052

Your main problem using xsltproc is that you're trying to process HTML instead of XML. The difference is in the <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> tag which isn't closed and hence there is no valid XML for the XSLT processor (what results in an error). So add a closing char to make it

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

If you fix this problem and add a template that removes 'non-matching' text() nodes like

<xsl:template match="text()" />

your XSLT will do what you want.

Upvotes: 1

Valdi_Bo
Valdi_Bo

Reputation: 31011

Actually, XSTL processes XML files, not HTML.

Your source HTML almost meets requirements of well-formed XML. There is only one error: Your meta element is not closed, so I changed it to:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

(adding / before the closing >). Otherwise the XSLT processor displays an error message (at least in my installation).

As far as your XSLT is concerned, I made a few corrections:

  • match="body" changed to match="html",
  • added // in the second xsl:value-of,
  • changed "bare" | to <xsl:text>|</xsl:text>, only for readability reason (longer lines can not be seen on smaller monitors),
  • added <xsl:output method="text"/> as your output does not seem to be any XML.

Last 2 changes are optional, you can ignore them.

So the whole script can be like below:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="html">
    <xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>
    <xsl:text>|</xsl:text>
    <xsl:value-of select="//item[@itemprop='itemprop5']/@content"/>
  </xsl:template>
</xsl:stylesheet>

Upvotes: 3

Related Questions