Reputation: 367
Please assist me to extract data from XML, I am scarmbling to find a logic with little knowledge in Powershell script. I need this logic to be implemented without installing additional modules/libraries in powershell.
I need to get the maximum priority in the XML grouped by KEY along with HITS (H) count.
Script shared below by @roadrunner works as expected which is great!, but when I run for larger XML file (2GB xml file), it takes long time to process, is there something we can do to mutli-thread and reduce the processing time ?
<ABC-FOF-PROCESS>
<H>
<PRIORITY>-14</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>-14</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>-3</PRIORITY>
<KEY>D6306210-CF424F11-8E2D3496-E6CE1CA7</KEY>
</H>
<H>
<PRIORITY>1</PRIORITY>
<KEY>D6306210-CF424F11-8E2D3496-E6CE1CA7</KEY>
</H>
<H>
<PRIORITY>-3</PRIORITY>
<KEY>4EFR02B4-ADFDAF12-3C123II2-ADAFADFD</KEY>
</H>
<H>
<PRIORITY>-14</PRIORITY>
<KEY>5D2702B2-ECE8F1FB-3CEC3229-5FE4C4BC</KEY>
</H>
</ABC-FOF-PROCESS>
For Example logic of expected output should be something like this:
1nd and 2nd Hit tag has Priority -14 and grouped into one common KEY, and the max prioirty is -14.
Output Slab for first 2 hits:
PRIOIRTY KEY HITS
-14 1 2
3rd and 4th Hit tag has maximum Priority 1 with one common KEY. Output Slab:
PRIOIRTY KEY HITS
1 1 2
5th H tag has one Priority -3 with one common KEY. Output Slab:
PRIOIRTY KEY HITS
-3 1 1
6th H tag has one Priority -3 with one common KEY. Output Slab:
PRIOIRTY KEY HITS
-14 1 1
FINAL OUTPUT (Aggregated):
PRIOIRTY KEY HITS
1 1 2
-3 1 1
-14 2 3
Test Case:
<ABC-FOF-PROCESS>
<H>
<PRIORITY>0</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>-2</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>-2</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>6</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>3</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>-3</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
<H>
<PRIORITY>3</PRIORITY>
<KEY>F637A146-3437AB82-BA659D4A-17AC7FBF</KEY>
</H>
</ABC-FOF-PROCESS>
Actual Output:
PRIORITY KEY HITS
-------- --- ----
6 1 1
3 1 2
0 1 1
-2 1 2
-3 1 1
Expected Output: Only the maximum priority should be picked up summing up all the hits for a common key (F637A146-3437AB82-BA659D4A-17AC7FBF)
PRIORITY KEY HITS
-------- --- ----
6 1 7
In my OP 3rd and 4th Hit tag has maximum Priority 1 (-3 > 1) with one common KEY (D6306210-CF424F11-8E2D3496-E6CE1CA7).
Upvotes: 2
Views: 399
Reputation: 26315
You can group by PRIORITY
with Group-Object
, then calculate the KEY
(number of unique keys found) and HITS
(total number of keys found) and insert these properties into a System.Management.Automation.PSCustomObject
. Then you can sort the final result by PRIORITY
with Sort-Object
.
For loading the XML, I use New-Object
to create a System.Xml.XmlDocument
object, then I load the data from the file with System.Xml.XmlDocument.Load
. The other way to do this is with $xml = [xml](Get-Content -Path test.xml)
.
# Create XML object to load data into
$xml = New-Object -TypeName System.Xml.XmlDocument
# Load in XML file
$xml.Load("test.xml")
# Group XML child nodes by Priority
$groups = $xml.'ABC-FOF-PROCESS'.ChildNodes | Group-Object -Property PRIORITY
# Iterate groups and create PSCustomObject for each grouping
& {
foreach ($group in $groups)
{
# Priority is group name
# Key is just the number of unique keys found
# Hits is the total count of the keys
[PSCustomObject]@{
PRIORITY = [int]$group.Name
KEY = ($group.Group.KEY | Select-Object -Unique).Count
HITS = $group.Count
}
}
} | Sort-Object -Property PRIORITY -Descending # Sort final output by priority
Output:
PRIORITY KEY HITS
-------- --- ----
1 1 1
-3 2 2
-14 2 3
To output the above to a text file, we can pipe the the output to Out-File
:
& {
foreach ($group in $groups)
{
[PSCustomObject]@{
PRIORITY = [int]$group.Name
KEY = ($group.Group.KEY | Select-Object -Unique).Count
HITS = $group.Count
}
}
} | Sort-Object -Property PRIORITY -Descending | Out-File -FilePath output.txt
# Pipe output here
Upvotes: 1