Michael Havey
Michael Havey

Reputation: 21

Good XSLT for Python - lxml struggles

I'm trying to transform XHTML to text using a user-defined XSLT, which is the following:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml">

<xsl:output method="text"/>

<xsl:template match="/html">
Reading document entitled <xsl:value-of select="head/title"/>.

The top menu for this site has the following options:
<xsl:for-each select="body//ul[@role='menubar']/li/a">
    <xsl:value-of select="."></xsl:value-of> <xsl:text>&#xa;</xsl:text>
</xsl:for-each>

Now let's read the main part of the page.
<xsl:for-each select="body//main[@class='container']//(h1 | h2 | h3 | h4 | p | ul/li/a)">
    <xsl:value-of select="normalize-space(.)"/><xsl:text>&#xa;</xsl:text><xsl:text>&#xa;</xsl:text>    
</xsl:for-each>

The footer menu for this site has the following options:
<xsl:for-each select="body//footer[@id='wb-info']//ul/li/a">
    <xsl:value-of select="."></xsl:value-of> <xsl:text>&#xa;</xsl:text>
</xsl:for-each>

</xsl:template>
</xsl:stylesheet>

When I test in http://xsltransform.net/, applying it a typical HTML, the output is as expected.

I test the same XSLT against the same XHTML using the following Python code:

import lxml.etree as ET

html = ET.parse("../fixed_html/about.html")
xslt = ET.parse("../templates/generic.xslt")
transform = ET.XSLT(xslt)
res = transform(html)
print(res)

I get the following error:

lxml.etree.XSLTParseError: xsl:for-each : could not compile select expression 'body//main[@class='container']//(h1 | h2 | h3 | h4 | p | ul/li/a)'

My first thought is that lxml has limitations. It can't handle valid XSLT. I'm hoping that's not the case, and I just failed to setup the code correctly.

Any issues with the Python code? Can I process the XSLT above in Python some other way?

Upvotes: 1

Views: 1193

Answers (2)

Martin Honnen
Martin Honnen

Reputation: 167696

XSLT 2 or 3 for Python is supported by Saxonica's SaxonC 11.1 release, done this month, see details at https://www.saxonica.com/download/c.xml and https://www.saxonica.com/saxon-c/documentation11/index.html#!starting.

At the current stage, you need to compile/build the Python module on your own after downloading the source code and the library modules of SaxonC 11.1.

Upvotes: 2

michael.hor257k
michael.hor257k

Reputation: 117083

Your stylesheet declares version="1.0" but the code itself requires an XSLT 2.0 processor:

  1. The xpath-default-namespace attribute is an XSLT 2.0 feature;
  2. In XPath 1.0 parentheses are allowed only in the first location step.

lxml uses the libxslt processor that only supports XSLT 1.0. You will need to rewrite your stylesheet for XSLT 1.0 or find a way to incorporate an XSLT 2.0 or higher processor in your processing chain.


When I test in http://xsltransform.net/, applying it a typical HTML, the output is as expected.

Only when you select the Saxon 9.5.1 engine. With any other processor you will get an error.

Upvotes: 2

Related Questions