Reputation: 24470
What is the correct XPath syntax to match both attributes and elements?
More Info
I created the below function to find elements and attributes which contain a given value:
function Get-XPathToValue {
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[xml]$Xml
,
[Parameter(Mandatory)]
[string]$Value
)
process {
$Xml.SelectNodes("//*[.='{0}']" -f ($Value -replace "'","''")) | %{
$xpath = ''
$elem = $_
while (($elem -ne $null) -and ($elem.NodeType -ne 'Document')) {
$xpath = '/' + $elem.Name + $xpath
$elem = $elem.SelectSingleNode('..')
}
$xpath
}
}
}
This matches elements, but not attributes.
By replacing $Xml.SelectNodes("//*[.='{0}']"
with $Xml.SelectNodes("//@*[.='{0}']"
I can match attributes, but not elements.
Example
[xml]$sampleXml = @"
<root>
<child1>
<child2 attribute1='hello'>
<ignoreMe>what</ignoreMe>
<child3>hello</child3>
<ignoreMe2>world</ignoreMe2>
</child2>
<child2Part2 attribute2="ignored">hello</child2Part2>
</child1>
<notMe>
<norMe>Not here</norMe>
</notMe>
</root>
"@
Get-XPathToValue -Xml $sampleXml -Value 'hello'
Returns:
/root/child1/child2/child3
/root/child1/child2Part2
Should Return:
/root/child1/child2/attribute1
/root/child1/child2/child3
/root/child1/child2Part2
What have you tried?
I tried matching on:
//@*|*[.='{0}']
- returns matching elements, but all attributes.//*|@*[.='{0}']
- returns matching attributes, but all elements.//*[.='{0}']|@*[.='{0}']"
- returns matching elements.//@*[.='{0}']|*[.='{0}']"
- returns matching attributes.//(@*|*)[.='{0}']"
- throws an exception.Upvotes: 0
Views: 1600
Reputation: 338316
Your method of deriving an XPath expression has three flaws, as indicated in the comments to your question.
Here is my take on a function that addresses these points (I also gave it a name that I think is more appropriate within the cmdlet naming scheme):
function Convert-ValueToXpath {
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[xml]$Xml
,
[Parameter(Mandatory)]
[string]$Value
)
process {
$escapedValue = "concat('', '" + ($value -split "'" -join "', ""'"", '") + "')"
$Xml.SelectNodes("(//*|//@*)[normalize-space() = {0}]" -f $escapedValue) | % {
$xpath = ''
$elem = $_
while ($true) {
if ($elem.NodeType -eq "Attribute") {
$xpath = '/@' + $elem.Name
$elem = $elem.OwnerElement
} elseif ($elem.ParentNode) {
$precedingExpr = "./preceding-sibling::*[local-name() = '$($elem.LocalName)' and namespace-uri() = '$($elem.NamespaceURI)']"
$pos = $elem.SelectNodes($precedingExpr).Count + 1
$xpath = '/' + $elem.Name + "[" + $pos + "]" + $xpath
$elem = $elem.ParentNode
} else {
break;
}
}
$xpath
}
}
}
For your sample input I get these XPaths:
/root[1]/child1[1]/child2[1]/@attribute1 /root[1]/child1[1]/child2[1]/child3[1] /root[1]/child1[1]/child2Part2[1]
Upvotes: 1
Reputation: 24470
Using the following XPath resolved the issue: //@*[.='{0}']|//*[.='{0}']
i.e.
function Get-XPathToValue {
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[xml]$Xml
,
[Parameter(Mandatory)]
[string]$Value
)
process {
$Xml.SelectNodes("//@*[.='{0}']|//*[./text()='{0}']" -f ($Value -replace "'","''")) | %{
$xpath = ''
$elem = $_
while (($elem -ne $null) -and ($elem.NodeType -ne 'Document')) {
$prefix = ''
if($elem.NodeType -eq 'Attribute'){$prefix = '@'}
$xpath = '/' + $prefix + $elem.Name + $xpath
$elem = $elem.SelectSingleNode('..')
}
$xpath
}
}
}
Upvotes: 1