Reputation: 21
I'm trying to transform XHTML to text using a user-defined XSLT, which is the following:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:output method="text"/>
<xsl:template match="/html">
Reading document entitled <xsl:value-of select="head/title"/>.
The top menu for this site has the following options:
<xsl:for-each select="body//ul[@role='menubar']/li/a">
<xsl:value-of select="."></xsl:value-of> <xsl:text>
</xsl:text>
</xsl:for-each>
Now let's read the main part of the page.
<xsl:for-each select="body//main[@class='container']//(h1 | h2 | h3 | h4 | p | ul/li/a)">
<xsl:value-of select="normalize-space(.)"/><xsl:text>
</xsl:text><xsl:text>
</xsl:text>
</xsl:for-each>
The footer menu for this site has the following options:
<xsl:for-each select="body//footer[@id='wb-info']//ul/li/a">
<xsl:value-of select="."></xsl:value-of> <xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
When I test in http://xsltransform.net/, applying it a typical HTML, the output is as expected.
I test the same XSLT against the same XHTML using the following Python code:
import lxml.etree as ET
html = ET.parse("../fixed_html/about.html")
xslt = ET.parse("../templates/generic.xslt")
transform = ET.XSLT(xslt)
res = transform(html)
print(res)
I get the following error:
lxml.etree.XSLTParseError: xsl:for-each : could not compile select expression 'body//main[@class='container']//(h1 | h2 | h3 | h4 | p | ul/li/a)'
My first thought is that lxml has limitations. It can't handle valid XSLT. I'm hoping that's not the case, and I just failed to setup the code correctly.
Any issues with the Python code? Can I process the XSLT above in Python some other way?
Upvotes: 1
Views: 1193
Reputation: 167696
XSLT 2 or 3 for Python is supported by Saxonica's SaxonC 11.1 release, done this month, see details at https://www.saxonica.com/download/c.xml and https://www.saxonica.com/saxon-c/documentation11/index.html#!starting.
At the current stage, you need to compile/build the Python module on your own after downloading the source code and the library modules of SaxonC 11.1.
Upvotes: 2
Reputation: 117083
Your stylesheet declares version="1.0"
but the code itself requires an XSLT 2.0 processor:
xpath-default-namespace
attribute is an XSLT 2.0 feature;lxml
uses the libxslt
processor that only supports XSLT 1.0. You will need to rewrite your stylesheet for XSLT 1.0 or find a way to incorporate an XSLT 2.0 or higher processor in your processing chain.
When I test in http://xsltransform.net/, applying it a typical HTML, the output is as expected.
Only when you select the Saxon 9.5.1 engine. With any other processor you will get an error.
Upvotes: 2