Reputation: 41
I'm somewhat new to XPath
so forgive me in advance. I'd like to be able to search HTML comments, conditional comments in particular and return only certain tags such as <link>
and <script>
.
So far I've been able to return a collection of the comments that contain those tags with: //comment()[contains(.,'link') or contains(.,'script')]
, but at this point, I'm not sure how to extract the actual tags themselves as nodes with attributes.
Can anyone help me please?
Here's an example from what I'm trying to retrieve various elements from: I need to able to grab the link and script elements. Probably should have also mentioned I'm using C# and the HTML Agility Pack.
<head>
<!--[if IE 7]>
<link rel="stylesheet" href="/layout/css/IE7.css" />
<![endif]-->
<!--[if IE 9]>
<link rel="stylesheet" href="/layout/css/IE9.css" />
<![endif]-->
</head>
Upvotes: 3
Views: 614
Reputation: 243479
So far I've been able to return a collection of the comments that contain those tags with:
//comment()[contains(.,'link') or contains(.,'script')]
, but at this point, I'm not sure how to extract the actual tags themselves as nodes with attributes.
This cannot be done, because at the time of XPath expression evaluation there are no nodes inside the comment -- just string.
What can be done is to get the wanted strings.
For example, the result of evaluating this XPath expression when the context node is one of the two comments:
"substring-before(substring-after(., '>'),
'<![endif]'
)
is, respectively::
<link rel="stylesheet" href="/layout/css/IE7.css" />
<link rel="stylesheet" href="/layout/css/IE9.css" />
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="comment()">
<xsl:value-of select=
"substring-before(substring-after(., '>'),
'<![endif]'
)"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<head>
<!--[if IE 7]>
<link rel="stylesheet" href="/layout/css/IE7.css" />
<![endif]-->
<!--[if IE 9]>
<link rel="stylesheet" href="/layout/css/IE9.css" />
<![endif]-->
</head>
the XPath expression is evaluated on each comment node and the result of this evaluation is output:
<link rel="stylesheet" href="/layout/css/IE7.css" />
<link rel="stylesheet" href="/layout/css/IE9.css" />
Upvotes: 1
Reputation: 499062
Use the element name (what you call "tag" is called an element in XML/XPath parlance) this will select it and all attached nodes - which includes all the attributes of the element.
So, if you document looks like:
<html>
<head>
<link rel="stylesheet" type="text/css" href="theme.css" />
</head>
<body>
...
</body>
</html>
You could use the following XPath:
/html/head/link
The returned node set will contain all link
element and you can then query them for the attribute values.
Update:
Seeing the sample markup, things are bit more complicated... You are using IE conditional comments.
This makes the items within them appear as comments to all browsers/parsers except for IE. This is a problem as you want to retrieve the <link>
"elements" embedded in the comments.
You would need to strip out the conditional comments yourself - a specialized hand written parser might be the best option, as the HAP will only see comments here.
Upvotes: 0