Reputation: 5686
I have an XML file with the following (simplified) structure:
<XML>
<Observation>
<Dimension value="2018-11-01" />
<Value value="123" />
</Observation>
<Observation>
<Dimension value="2018-11-02" />
<Value value="456" />
</Observation>
<Observation>
<Dimension value="2018-12-01" />
<Value value="789" />
</Observation>
<Observation>
<Dimension value="2018-12-02" />
<Value value="222" />
</Observation>
</XML>
The task at hand is to delete nodes where the date in the value
attribute of the Dimension
node not the maximum date. Or in other words: Only the nodes containing the maximum/highest date in the value
attribute of the Dimension
node should be kept. This should be done per month.
Hence, the result should look as follows:
<XML>
<Observation>
<Dimension value="2018-11-02" />
<Value value="456" />
</Observation>
<Observation>
<Dimension value="2018-12-02" />
<Value value="222" />
</Observation>
</XML>
How can this be done in Powershell? I know how to read an XML file and how to make XPath-based queries:
$doc.SelectNodes("//Observation", $ns)
However, I do not know how to a) determine the maximum/highest date per month, and b) how to delete nodes that do not contain the maximum/highest date.
EDIT:
Another, maybe easier, way of doing this would be as follows:
Upvotes: 0
Views: 370
Reputation: 1244
Grouping by month using Group-Object
simplifies the process.
$doc.XML.Observation | Group-Object { $_.Dimension.value.Substring(0,7) } | foreach {
$_.Group | Sort-Object { $_.Dimension.value } -Descending |
Select-Object -Skip 1 | foreach { $doc.XML.RemoveChild($_) }
}
The following is the method corresponding to the case where there are multiple parent nodes.
$doc.SelectNodes("//message:DataSet/generic:Series", $ns) | foreach {
$_.SelectNodes("./generic:Obs", $ns) | Group-Object { $_.ObsDimension.value.Substring(0,7) } | foreach {
$_.Group | Sort-Object { $_.ObsDimension.value } -Descending |
Select-Object -Skip 1 | foreach { $_.ParentNode.RemoveChild($_) }
}
}
Upvotes: 1
Reputation: 1782
This should do excatly what you want:
Add-Type -AssemblyName System.Collections
$filePath = "inputfile.xml"
$filePath1 = "outputfile.xml"
$xmlContent = New-Object System.Xml.XmlDocument
$xmlContent.PreserveWhitespace = $true
$xmlContent = [xml]([System.IO.File]::ReadLines($filePath))
[System.Collections.Generic.List[string]]$highestValues = @()
$oldMonth = ""
$oldYear = ""
$xmlContent.XML.Observation.Dimension | Sort-Object { $_.value } -Descending | ForEach-Object {
$currentDate = $_.value
$currentYear = $currentDate.Substring(0,4)
$currentMonth = $currentDate.Substring(5,2)
if( $currentYear -ne $oldYear -or $currentMonth -ne $oldMonth ) {
$oldYear = $currentYear
$oldMonth = $currentMonth
$highestValues.Add( $currentDate )
}
}
$numItems = ($xmlContent.XML.Observation.Dimension).Count
for( $i = $numItems - 1; $i -ge 0; $i-- ) {
if( !$highestValues.Contains( $xmlContent.XML.Observation.Dimension[$i].value ) ) {
[void]$xmlContent.XML.RemoveChild( $xmlContent.XML.Observation[$i] )
}
}
[void]$xmlContent.Save( $filePath1 )
Upvotes: 0