ray
ray

Reputation: 8699

How do I get a list of child nodes from XML

Given the following sample script (saved as sample.txt on my hard drive):

DECLARE @xVar XML
SET @xVar = 
  '<bookstore>
      <book>
        <title>Writing Secure Code</title>
        <author>
          <first-name>Michael</first-name>
          <last-name>Howard</last-name>
        </author>
        <author>
          <first-name>David</first-name>
          <last-name>LeBlanc</last-name>
        </author>
        <price>39.99</price>
      </book>
      <book>
        <title>Old Man and the sea</title>
        <author>
          <first-name>Earnest</first-name>
          <last-name>Hemmingway</last-name>
        </author>
        <price> 9.99</price>
      </book>
    </bookstore>
    '

SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
       nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM   @xVar.nodes('//author') AS R(nref)
WHERE  nref.exist('.[first-name != "David"]') = 1

I want to extract the XML and determine if any child nodes have been added to node book. For this use case, assume that author is the new node.

I have written a script that works, but it seems grossly inefficient:

Set-StrictMode -Version Latest

cls
#The list of fields that I want to find
[array]$expected_fields = "title", "price"

#extract the xml from sample.txt and put it in an xml variable
[string]$file_script = Get-Content "C:\PowerShellScripts\sample.txt"
[int]$start_pos = $file_script.IndexOf("'") + 1
[int]$end_pos = $file_script.SubString($start_pos + 1).IndexOf("'") + 1
[xml]$xml_result = $file_script.SubString($start_pos,$end_pos)

#NOTE:  THIS IS THE PART THAT FEELS WRONG
#Convert the xml snipput into CSV file and then get the headers (which is the only thing I want)
$export_file_name = "C:\PowerShellScripts\test.csv"
Select-Xml 'child::bookstore/book' $xml_result  | Select-Object -expand Node | Export-Csv $export_file_name -NoTypeInformation -Delimiter:"`t" -Encoding:UTF8 
[string]$field_names = Get-Content $export_file_name | Select-Object -first 1
Remove-Item "C:\Users\Jennifer\Google Drive\PowerShellScripts\test.csv"
[array]$found_fields = $field_names.Replace("""","").Split("`t")

#report new fields
foreach ($specific_field in $found_fields) {
    if ($expected_fields -notcontains $specific_field)
    {
        Write-Host "New field found:" $specific_field
    }
}

Is there a better way to populate *$found_fields* rather than creating a CSV file, storing the first row in a variable and then deleting the CSV file?

Upvotes: 5

Views: 38576

Answers (1)

Filburt
Filburt

Reputation: 18061

Try changing your -Expand from Node to Name (and Where-Object to exclude title and price)

$xml_result.SelectNodes("bookstore/book/*") | Select-Object -Expand Name | Where-Object { ($_ -ne "title") -and ($_ -ne "price") }

This will give you any unexpected child nodes of book.

Upvotes: 5

Related Questions