Joey
Joey

Reputation: 401

Need XSL to Interpret Anchor Tags Inside XML String

Problem

I'm using Apache FOP to produce a PDF from an XML and XSL file. The XML file is downloaded from an external site, not generated by me, so any manipulation of it would need to be scripted. It contains tags that contain anchor HTML tags inside of them intended to be named hyperlinks, but the xsl:value-of tag seems to somehow strip any functionality of anchor tags inside of the string itself and leaves the PDF report to have the display text of the anchor, but not the hyperlink, leading to "click here" labels that aren't actually clickable with no way to tell what the URL was supposed to be because the href tag is completely gone.

XML (snippet)

<SOLUTION>See the <A HREF="https://cheatsheetseries.owasp.org/cheatsheets/Clickjacking_Defense_Cheat_Sheet.html" TARGET="_blank">Cheat Sheet</A> for more information.</SOLUTION>

XSL (snippet)

<fo:block line-height="15pt" font-size="10pt" start-indent="2em" linefeed-treatment="preserve">
    <xsl:value-of select="SOLUTION" />
</fo:block>

Result

The PDF output contains the text inside the anchor tags but the href is lost completely and not displayed. I thought maybe this was an issue with Apache FOP and the PDF generation step, but when I visited the w3Schools page for the XML/XSL xsl:value-of tag, I modified the first title tag as such: <title>"<a href='www.google.com'>Empire Burlesque</a>"</title> and noted the exact same behavior as is happening for my PDF generation: link doesn't work, href attribute is gone completely.

w3schools example: https://www.w3schools.com/xml/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog_ex2

Workaround

For now, I'm going to surround the offending elements' string values with CDATA tags. This leads to the PDF containing the following text (plain text - you see the tags):

See the <A HREF="https://cheatsheetseries.owasp.org/cheatsheets/Clickjacking_Defense_Cheat_Sheet.html" TARGET="_blank">Cheat Sheet</A> for more information.

While this is sloppy compared to having labeled links, the link is clickable from the PDF and works and the client can also copy/paste the URL if desired.

Research

My research on here and other searches has only led to the generation/handling of anchor tags inside the XSL document and never inside the XML elements and needing to be interpreted by the XSL as a hyperlink. I can't find anything that explicitly says that xsl:value-of invalidates anchor tags inside of the string it returns, but that certainly seems to be what I am seeing.

These hyperlinks are inline inside the XML elements and the XML data is downloaded through scripts as part of reports from an external website, so my XSL needs to work for whatever hyperlinks are included in those elements (I do know which ones potentially contain hyperlinks) without me manually editing the XML (I can manipulate it with scripts, but manually is just not feasible as this is all part of a script that generates PDF reports from XML/XSL input).

Thanks to anyone who can offer some insight or at least confirm that what I'm trying to do is not possible. I will post any edits if I find anything through further research.

Upvotes: 0

Views: 186

Answers (1)

Michael Kay
Michael Kay

Reputation: 163458

The xsl:value-of instruction does exactly what you are describing: it extracts the string value of an element, dropping all internal markup.

To retain the existing markup unchanged, use the xsl:copy-of instruction.

To process the internal markup, turning it into something else, use the xsl:apply-templates instruction, with appropriate template rules to handle the descendant elements encountered.

Finally, please don't use w3schools as your primary learning resource. It's handy as a quick reference when you understand the concepts of the language and need a reminder of the details. It's not a good way of learning the concepts initially. It's also not a good place to go once you're beyond the basics and need a detailed explanation of edge cases (it tends to simplify).

Note: you say "I can't find anything that explicitly says that xsl:value-of invalidates anchor tags inside of the string it returns". w3schools says "The xsl:value-of element extracts the value of a selected node." But (typically) it doesn't say what it means to "extract the value". If you go to the XSLT 1.0 specification, however (https://www.w3.org/TR/xslt-10/#value-of) it's very clear: "The xsl:value-of element is instantiated to create a text node in the result tree. The required select attribute is an expression; this expression is evaluated and the resulting object is converted to a string as if by a call to the string function. The string specifies the string-value of the created text node." The fact that xsl:value-of creates a text node means (if you've understood the concept of the tree model) that it can't possibly retain any descendant node structure.

Upvotes: 2

Related Questions