stout.johnson
stout.johnson

Reputation: 25

Parsing an XML file with PowerShell with node from variable

Hello dear fellow Powershell users,

I'm trying to parse xml files, which can differ in structure. Therefore, I want to access the node values based on the node structure received from a variable.

Example

#XML file
$xml = [xml] @'
<node1>
    <node2>
        <node3>
            <node4>test1</node4>
        </node3>
    </node2>
</node1>
'@

Accessing the values directly works.

#access XML node directly -works-
$xml.node1.node2.node3.node4        # working <OK>

Accessing the values via node information from variable does not work.

#access XML node via path from variable -does not work-
$testnodepath = 'node1.node2.node3.node4'

$xml.$testnodepath                  # NOT working
$xml.$($testnodepath)               # NOT working

Is there a way to access the XML node values directly via receiving node information from a variable?

PS: I am aware, that there is a way via Selectnode, but I assume that is inefficient since it basically searching for keywords.

#Working - but inefficient
$testnodepath = 'node1/node2/node3/node4'
$xml.SelectNodes($testnodepath)

I need a very efficient way of parsing the XML file since I will need to parse huge XML files. Is there a way to directly access the node values in the form $xml.node1.node2.node3.node4 by receiving the node structure from a variable?

Upvotes: 2

Views: 1917

Answers (4)

Tesh
Tesh

Reputation: 29

I have a similar requirement to this, however, my requirement is to set values referencing nodes using a variable. We need this ability so that we can have one script which can reference different psd1 files and set the information correctly. Hard coding paths mean we need multiple scripts to do the same thing. As you can imagine this is a nightmare.

... The following works.

[XML]$doc = Get-Content $my_xml_file
$xml_cfg = Import-LocalizedData = xml_information.psd1
$xml_path = "FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id"
$doc.FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id = $xml_cfg.from_id

However, this fails: $doc.$xml_path = xml_cfg.from_id

ERROR: "The property 'FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id' cannot be found on this object. Verify that the property exists and can be set."

...

It is a real shame PowerShell cannot handle variable references to objects. Referencing objects using variables works fine in Perl and thanks to these sorts of limitations prevents us from migrating all our code to PowerShell.

Upvotes: 0

iRon
iRon

Reputation: 23830

You might use the ExecutionContext ExpandString for this:

$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$testnodepath)")
test1

If the node path ($testnodepath) comes from outside (e.g. a parameter), you might want to prevent any malicious code injections by striping of any character that is not a word character or a dot (.):

$securenodepath = $testnodepath -Replace '[^\w\.]'
$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$securenodepath)")

Upvotes: 2

zett42
zett42

Reputation: 27806

I will need to parse huge XML files

The following presents a memory-friendly streaming approach, that doesn't require to load the whole XML document (DOM) into memory. So you could parse really huge XML files even if they don't fit into memory. It should also improve parsing speed as we can simply skip elements that we are not interested in. To accomplish this, we use System.Xml.XmlReader to process XML elements on-the-fly, while they are read from the file.

I've wrapped the code in a reusable function:

Function Import-XmlElementText( [String] $FilePath, [String[]] $ElementPath ) {

    $stream = $reader = $null

    try {
        $stream = [IO.File]::OpenRead(( Convert-Path -LiteralPath $FilePath )) 
        $reader = [System.Xml.XmlReader]::Create( $stream )

        $curElemPath = ''  # The current location in the XML document

        # While XML nodes are read from the file
        while( $reader.Read() ) {
            switch( $reader.NodeType ) {
                ([System.Xml.XmlNodeType]::Element) {
                    if( -not $reader.IsEmptyElement ) {
                        # Start of a non-empty element -> add to current path
                        $curElemPath += '/' + $reader.Name
                    }
                }
                ([System.Xml.XmlNodeType]::Text) {
                    # Element text -> collect if path matches
                    if( $curElemPath -in $ElementPath ) {
                        [PSCustomObject]@{
                            Path  = $curElemPath
                            Value = $reader.Value
                        }
                    }
                }
                ([System.Xml.XmlNodeType]::EndElement) {
                    # End of element - remove current element from the path
                    $curElemPath = $curElemPath.Substring( 0, $curElemPath.LastIndexOf('/') ) 
                }
            }
        }
    }
    finally {
        if( $reader ) { $reader.Close() }
        if( $stream ) { $stream.Close() }
    }
}

Call it like this:

Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

Given this input XML:

<node1>
    <node2a>
        <node3a>test1</node3a>
        <node3b/>
        <node3c a='b'/>
        <node3d></node3d>
    </node2a>
    <node2b>test2</node2b>
</node1>

This output is produced:

Path                 Value
----                 -----
/node1/node2a/node3a test1
/node1/node2b        test2

Actually the function outputs objects which can be processed by pipeline commands as usual or be stored in an array:

$foundElems = Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

$foundElems[1].Value  # Prints 'test2'

Notes:

  • Convert-Path is used to convert a PowerShell path (aka PSPath), which might be relative, to an absolute path that can be used by .NET functions. This is required because .NET uses a different current directory than PowerShell and a PowerShell path can be in a form that .NET doesn't even understand (e. g. Microsoft.PowerShell.Core\FileSystem::C:\something.txt).
  • When encountering start of an element, we have to skip empty elements such as <node/>, because for such elements we don't enter the EndElement case branch, which would render the current path ($curElemPath) invalid (the element would not be removed from the current path again).

Upvotes: 1

Mathias R. Jessen
Mathias R. Jessen

Reputation: 175065

You can split the string containing the property path into individual names and then dereference them 1 by 1:

# define path
$testnodepath = 'node1.node2.node3.node4'

# create a new variable, this will be our intermediary for keeping track of each node/level we've resolved so far
$target = $xml

# now we just loop through each node name in the path
foreach($nodeName in $testnodepath.Split('.')){
  # keep advancing down through the path, 1 node name at a time
  $target = $target.$nodeName
}

# this now resolves to the same value as `$xml.node1.node2.node3.node4`
$target

Upvotes: 1

Related Questions