Naga
Naga

Reputation: 367

Parse XML to extract data with grouping in PowerShell

Please assist me to extract data from XML, I am scarmbling to find a logic with little knowledge in Powershell script. I need this logic to be implemented without installing additional modules/libraries in powershell.

I need to get the maximum priority in the XML grouped by KEY along with HITS (H) count.

Script shared below by @roadrunner works as expected which is great!, but when I run for larger XML file (2GB xml file), it takes long time to process, is there something we can do to mutli-thread and reduce the processing time ?

<ABC-FOF-PROCESS>
<H>
 <PRIORITY>-14</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>-14</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>-3</PRIORITY>
 <KEY>D6306210-CF424F11-8E2D3496-E6CE1CA7</KEY>
</H>
<H>
 <PRIORITY>1</PRIORITY>
 <KEY>D6306210-CF424F11-8E2D3496-E6CE1CA7</KEY>
</H>
<H>
 <PRIORITY>-3</PRIORITY>
 <KEY>4EFR02B4-ADFDAF12-3C123II2-ADAFADFD</KEY>
</H>
<H>
 <PRIORITY>-14</PRIORITY>
 <KEY>5D2702B2-ECE8F1FB-3CEC3229-5FE4C4BC</KEY>
</H>
</ABC-FOF-PROCESS>

For Example logic of expected output should be something like this:

1nd and 2nd Hit tag has Priority -14 and grouped into one common KEY, and the max prioirty is -14.

Output Slab for first 2 hits:

PRIOIRTY KEY  HITS  
-14        1     2

3rd and 4th Hit tag has maximum Priority 1 with one common KEY. Output Slab:

PRIOIRTY KEY  HITS  
1        1     2

5th H tag has one Priority -3 with one common KEY. Output Slab:

PRIOIRTY KEY  HITS  
-3       1     1

6th H tag has one Priority -3 with one common KEY. Output Slab:

PRIOIRTY KEY  HITS  
-14       1     1

FINAL OUTPUT (Aggregated):

PRIOIRTY KEY  HITS  
1        1    2
-3       1    1
-14      2    3

Test Case:

<ABC-FOF-PROCESS>
<H>
 <PRIORITY>0</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>-2</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>-2</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>6</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>3</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>-3</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
 <PRIORITY>3</PRIORITY>
 <KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
</ABC-FOF-PROCESS>

Actual Output:

PRIORITY KEY HITS
-------- --- ----
       6   1    1
       3   1    2
       0   1    1
      -2   1    2
      -3   1    1

Expected Output: Only the maximum priority should be picked up summing up all the hits for a common key (F637A146-3437AB82-BA659D4A-17AC7FBF)

PRIORITY KEY HITS
-------- --- ----
       6   1    7

In my OP 3rd and 4th Hit tag has maximum Priority 1 (-3 > 1) with one common KEY (D6306210-CF424F11-8E2D3496-E6CE1CA7).

Upvotes: 2

Views: 399

Answers (1)

RoadRunner
RoadRunner

Reputation: 26315

You can group by PRIORITY with Group-Object, then calculate the KEY(number of unique keys found) and HITS(total number of keys found) and insert these properties into a System.Management.Automation.PSCustomObject. Then you can sort the final result by PRIORITY with Sort-Object.

For loading the XML, I use New-Object to create a System.Xml.XmlDocument object, then I load the data from the file with System.Xml.XmlDocument.Load. The other way to do this is with $xml = [xml](Get-Content -Path test.xml).

# Create XML object to load data into
$xml = New-Object -TypeName System.Xml.XmlDocument

# Load in XML file
$xml.Load("test.xml")

# Group XML child nodes by Priority
$groups = $xml.'ABC-FOF-PROCESS'.ChildNodes | Group-Object -Property PRIORITY

# Iterate groups and create PSCustomObject for each grouping
& {
    foreach ($group in $groups)
    {
        # Priority is group name
        # Key is just the number of unique keys found
        # Hits is the total count of the keys
        [PSCustomObject]@{
            PRIORITY = [int]$group.Name
            KEY = ($group.Group.KEY | Select-Object -Unique).Count
            HITS = $group.Count
        } 
    }
} | Sort-Object -Property PRIORITY -Descending # Sort final output by priority 

Output:

PRIORITY KEY HITS
-------- --- ----
       1   1    1
      -3   2    2
     -14   2    3

To output the above to a text file, we can pipe the the output to Out-File:

& {
    foreach ($group in $groups)
    {
        [PSCustomObject]@{
            PRIORITY = [int]$group.Name
            KEY = ($group.Group.KEY | Select-Object -Unique).Count
            HITS = $group.Count
        }
    }
} | Sort-Object -Property PRIORITY -Descending | Out-File -FilePath output.txt
# Pipe output here

Upvotes: 1

Related Questions