Pradeep Anand
Pradeep Anand

Reputation: 155

How to do a xsl-template match for a html string

I have a scenario where i need to render html in the pdf using XSLT. I have some html contents in xml file like

<section>
&lt;p&gt;&lt;b&gt;&lt;u&gt;Heelo&lt;/u&gt;&lt;/b&gt;&lt;/p&gt;
</section>

I need to render this in the pdf.

 <xsl:template match="b">
    <fo:inline font-weight="bold">
        <xsl:apply-templates select="*|text()" />
    </fo:inline>
</xsl:template>

<xsl:template match="u">
    <fo:inline text-decoration="underline">
        <xsl:apply-templates select="*|text()" />
    </fo:inline>
</xsl:template>

<xsl:template match="i">
    <fo:inline font-style="italic">
        <xsl:apply-templates select="*|text()" />
    </fo:inline>
</xsl:template>

But this template match is not working. How to achieve this or is there any way to replace < as < and > as > while creating xml in java?

Thanks for the help in advance !!!

Upvotes: 0

Views: 716

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167571

If you want to parse HTML you need a way to integrate an HTML parser, that is possible with an XSLT 2 processor if you use David Carlisle's HTML parser implementation in XSLT 2 from https://github.com/davidcarlisle/web-xslt/blob/master/htmlparse/htmlparse.xsl, you can then import it and call the function to parse the content of the section element into nodes to be processed by your templates:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:fo="http://www.w3.org/1999/XSL/Format"
    xmlns:d="data:,dpc"
    exclude-result-prefixes="#all"
    version="3.0">

<xsl:import href="https://raw.githubusercontent.com/davidcarlisle/web-xslt/master/htmlparse/htmlparse.xsl"/>

<xsl:output indent="yes"/>

<xsl:template match="/">
  <fo:root>
      <fo:layout-master-set>
        <fo:simple-page-master master-name="first" page-height="29.7cm" page-width="21cm" margin-top="1cm" margin-bottom="2cm" margin-left="2.5cm" margin-right="2.5cm">
          <fo:region-body margin-top="1cm"/>
          <fo:region-before extent="1cm"/>
          <fo:region-after extent="1.5cm"/>
        </fo:simple-page-master>
      </fo:layout-master-set>


      <fo:page-sequence master-reference="first">
         <fo:flow flow-name="xsl-region-body">  
           <fo:block>
               <xsl:apply-templates/>
           </fo:block>
         </fo:flow>
      </fo:page-sequence>
  </fo:root>
</xsl:template>

<xsl:template match="section">
    <fo:block>
        <xsl:apply-templates select="d:htmlparse(., '', true())/node()"/>
    </fo:block>
</xsl:template>

<xsl:template match="b">
    <fo:inline font-weight="bold">
        <xsl:apply-templates select="*|text()" />
    </fo:inline>
</xsl:template>

<xsl:template match="u">
    <fo:inline text-decoration="underline">
        <xsl:apply-templates select="*|text()" />
    </fo:inline>
</xsl:template>

<xsl:template match="i">
    <fo:inline font-style="italic">
        <xsl:apply-templates select="*|text()" />
    </fo:inline>
</xsl:template>

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/94hvTAp

I have used your templates as shown in your question but note that you can simplify all the uses of <xsl:apply-templates select="*|text()" /> to <xsl:apply-templates/> normally.

Other ways depend on the particular XSLT processor used (i.e. whether it offers an extension like http://saxonica.com/html/documentation/functions/saxon/parse-html.html or whether it allows you to implement your own extension functions integrating an HTML parser).

If the HTML is well-formed XML (e.g. has all necessary end tags and quotes attributes, doesn't use HTML specific entity references) then you can also use the XPath 3.1 function parse-xml-fragment with an XSLT 3 processor like Saxon 9.8 or later:

<xsl:template match="section">
    <fo:block>
        <xsl:apply-templates select="parse-xml-fragment(.)/node()"/>
    </fo:block>
</xsl:template>

https://xsltfiddle.liberty-development.net/94hvTAp/1

Upvotes: 4

Related Questions