Reputation: 435
I have a large number of html files like the following 01.html file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Title</title>
</head>
<body>
<item itemprop="itemprop1" content="content1" />
<item itemprop="itemprop2" content="content2" />
<item itemprop="itemprop3" content="content3" />
<item itemprop="itemprop4" content="content4" />
<item itemprop="itemprop5" content="content5" />
<item itemprop="itemprop6" content="content6" />
<item itemprop="itemprop7" content="content7" />
<item itemprop="itemprop8" content="content8" />
<item itemprop="itemprop9" content="content9" />
</body>
</html>
There is only one item node with itemprop="itemprop1" in each html file. Same for itemprop2, itemprop3, etc.
I would like to have the following txt file output:
content1 | content 5
that is the concatenation of: 1. the value of the attribute content for the item with itemprop="itemprop1" 2. a pipe "|" 3. the value of the attribute content for the item with itemprop="itemprop5"
I run the following bash script:
xsltproc 01.xslt 01.html >> 02.txt
where 01.xslt is the following:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="body">
<xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>|<xsl:value-of select="item[@itemprop='itemprop5']/@content"/>
</xsl:template>
</xsl:stylesheet>
Unfortunately it doesn't work. What is the correct xslt file?
UPDATE
This is the final working example.
01.html is the following:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Title</title>
</head>
<body>
<item itemprop="itemprop1" content="content1" />
<item itemprop="itemprop2" content="content2" />
<item itemprop="itemprop3" content="content3" />
<item itemprop="itemprop4" content="content4" />
<item itemprop="itemprop5" content="content5" />
<item itemprop="itemprop6" content="content6" />
<item itemprop="itemprop7" content="content7" />
<item itemprop="itemprop8" content="content8" />
<item itemprop="itemprop9" content="content9" />
</body>
</html>
01.xslt is the following:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="html">
<xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>
<xsl:text>|</xsl:text>
<xsl:value-of select="//item[@itemprop='itemprop5']/@content"/>
</xsl:template>
</xsl:stylesheet>
and the output 02.txt is the following:
content1|content5
Upvotes: 2
Views: 8438
Reputation: 461
<xsl:output method="text" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select="html/body/item[@itemprop='itemprop1']/@content"/>|<xsl:value-of select="html/body/item[@itemprop='itemprop5']/@content"/>
</xsl:template>
Upvotes: 0
Reputation: 29052
Your main problem using xsltproc
is that you're trying to process HTML instead of XML. The difference is in the <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
tag which isn't closed and hence there is no valid XML for the XSLT processor (what results in an error). So add a closing char to make it
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If you fix this problem and add a template that removes 'non-matching' text()
nodes like
<xsl:template match="text()" />
your XSLT will do what you want.
Upvotes: 1
Reputation: 31011
Actually, XSTL processes XML files, not HTML.
Your source HTML almost meets requirements of well-formed
XML. There is only one error: Your meta
element is not closed,
so I changed it to:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
(adding /
before the closing >
).
Otherwise the XSLT processor displays an error message (at least in
my installation).
As far as your XSLT is concerned, I made a few corrections:
match="body"
changed to match="html"
,//
in the second xsl:value-of
,|
to <xsl:text>|</xsl:text>
, only for
readability reason (longer lines can not be seen on smaller
monitors),<xsl:output method="text"/>
as your output does not
seem to be any XML.Last 2 changes are optional, you can ignore them.
So the whole script can be like below:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="html">
<xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>
<xsl:text>|</xsl:text>
<xsl:value-of select="//item[@itemprop='itemprop5']/@content"/>
</xsl:template>
</xsl:stylesheet>
Upvotes: 3