Reputation: 493
I'm a PowerShell and XPath beginner struggling to efficiently parse through some XML and build up an array of objects for further processing (e.g. CSV output, SQL Server load). A sample of the XML is included below as well as the code snippet that I'm currently using. In this schema, each object-array
represents a single row in the desired output. I'm parsing the MetaData
children to get the proper names of the columns, then building a collection of PSObjects where each object in the array represents a single row. The MetaData
information is used to find the column names (PSObject properties).
This works fine for files with 10K rows or so, but bogs down horribly when run against my largest files with over 500K rows. In these cases each row is taking around 3-4 seconds to process. At 500K rows, that's a looong time to run. Is there some magic around XPath or PS variable assignment that I can use to speed this up?
The immediate need to is translate this XML into a CSV (currently performed via export-csv
), but I'd prefer to have this portion of the script generate a collection of objects as I'll next be looking to either load this data into a SQL Server instance or do other processing.
Thanks for the help!
David
<Report>
<Data>
<Columns>
<MetaData>
<Index>0</Index>
<Name>Column1</Name>
<Index>1</Index>
<Name>Column2</Name>
<Index>2</Index>
<Name>Column3</Name>
</MetaData>
</Columns>
<Rows>
<object-array>
<string>column1 value</string>
<int>column2 value</string>
<string>column3 value</string>
</object-array>
</Rows>
</Data>
</Report>
#extract the column headers
[string[]]$ColumnHeaders = @()
$obj.SelectNodes("/Report/Data/Columns/MetaData") |% {$ColumnHeaders += $_.name}
$collection = @()
$rowint = 0
$rowcount = $obj.Report.Data.Rows."object-array".count
#unwind the rows
do {
$hash=@{}
#loop through each element in the row parent element and add it to the hash
$columnint = 0
$columncount = (Select-Xml -xPath "Report/Data/Rows/object-array[$rowint]/node()" $obj).count
do {
$hash.Add($columnheaders[$columnint], (Select-Xml -xPath "Report/Data/Rows/object-array[$rowint]/descendant::text()[$columnint]" $obj).Node.Value)
$columnint++
} while ($columnint -lt $columncount)
$thisrow = New-Object PSObject -Property $hash
#add this new row to the collection
$collection += $thisrow
$rowint++
} while ($rowint -lt $rowcount)
Upvotes: 4
Views: 4752
Reputation: 126722
You can get MetaData names without re-creating ColumnHeaders in each itreation:
$ColumnHeaders = $obj.Report.Data.Columns.MetaData.Name
Same applies to $collection. How the end result of your code looks like?
UPDATE: Give this a try
[xml]$obj = Get-Content test.xml
$data = $obj.Report.Data
$pso = New-Object PSObject
$pso | Add-Member NoteProperty -Name $data.Columns.MetaData.Name[0] -Value $data.Rows.'object-array'.string[0]
$pso | Add-Member NoteProperty -Name $data.Columns.MetaData.Name[1] -Value $data.Rows.'object-array'.int
$pso | Add-Member NoteProperty -Name $data.Columns.MetaData.Name[2] -Value $data.Rows.'object-array'.string[1] -PassThru
Upvotes: 1