shakex
shakex

Reputation: 15

Can't delete parent node from XML file with a special character

I have this XML:

<Configuration>
  <Files>
    <File>
      <filename>"%test%\PKI\U"</filename>
    </File>
    <File>
      <filename>"%test%\PKI\G"</filename>
    </File>
    <File>
      <filename>"%test%\SDM"</filename>
    </File>
  </Files>
</Configuration>

I want it to look like this (remove the File nodes that contain PKI):

<Configuration>
  <Files>
    <File>
      <filename>"%test%\SDM"</filename>
    </File>
  </Files>
</Configuration>

Here is the code I have, but it's not working:

$xmlFilePath = \path\to\file.xml
[xml]$xmlDoc = Get-Content -Path $xmlFilePath

$fileNodesToRemove = $xmlDoc.SelectNodes("//File[filename[contains(., 'pki')]]")

foreach ($node in $fileNodesToRemove) {
    $node.ParentNode.RemoveChild($node)
}

$xmlDoc.Save($xmlFilePath)

How can I solve this?

Upvotes: 1

Views: 65

Answers (2)

mklement0
mklement0

Reputation: 439277

Santiago Squarzon has provided the crucial pointer:

  • XML and therefore also XPath is case-sensitive:

    • So are therefore the [xml] (System.Xml.XmlDocument) related .NET APIs.

    • Therefore, you must search for substring 'PKI' rather than 'pki' to find the nodes of interest, so as to match that substring case-exactly:

      $xmlDoc.SelectNodes('//File[filename[contains(., "PKI")]]')
      
  • By contrast, PowerShell's adaptation of the XML DOM is case-insensitive.

    • E.g., ([xml] '<PKI>foo</PKI>').pki works (outputs 'foo').
    • In essence, PowerShell allows you to treat any parsed [xml] document as an object graph that you can drill into using regular dot notation, because PowerShell surfaces XML (child) elements and XML attributes as namespace-less properties on each object (XML node) in the graph.
      And because all property (member) access in PowerShell is case-insensitive - as PowerShell generally is by default - child elements can be targeted case-insensitively too.

A streamlined reformulation of your code, using Select-Xml:

# Be sure to specify the file's full path, because .NET's working dir. usually
# differs from PowerShell's.
$xmlFilePath = 'C:\path\to\file.xml'

Select-Xml -LiteralPath $xmlFilePath -XPath '//File[filename[contains(., "PKI")]]' |
 ForEach-Object { $node = $_.Node; $node.ParentNode.RemoveChild($node) }

$node.OwnerDocument.Save($xmlFilePath)

Note:

  • The XML file path is passed directly to Select-Xml, which internally parses it into a [xml] instance and applies the XPath query passed to -XPath.

    • The output objects are wrappers around the matching XML nodes, hence the need to access the latter via $_.Node
  • By letting Select-Xml parse the XML file, the brittleness of [xml]$xmlDoc = Get-Content -Path $xmlFilePath is avoided:

    • The latter can result in mis-decoding of XML documents that use character encodings that PowerShell doesn't recognize by default - see the bottom section of this answer for details.
  • Slight caveat:

    • When a file path is passed to -LiteralPath / -Path, Select-Xml unexpectedly parses the file with preservation of significant whitespace; in the case at hand, this leaves two empty lines behind, one for each removed element.

    • If this is a concern, you replace -LiteralPath $xmlFilePath with -Content (Get-Content -Raw $xmlFilePath), but note that this again requires that you ensure that Get-Content recognizes the character encoding of the XML file correctly.

      • Alternatively, stick with your original approach (with the case correction applied), but robustly parse the XML file as follows:

        ($xmlDoc = [xml]::new()).Load($xmlFileName)
        

Upvotes: 1

jdweng
jdweng

Reputation: 34421

Using Xml Linq

using assembly System.Xml
using assembly System.Xml.Linq

$filename = 'c:\temp\test.xml'

$doc = [System.Xml.Linq.XDocument]::Load($filename)

$files = $doc.Descendants('Files')[0]
foreach($file in $files.Elements('File'))
{
   $filename = $file.Element('filename').Value
   if($filename.Contains('PKI')) {$file.Remove()}
}

Upvotes: -1

Related Questions