Reputation: 1
I have a large amount of XML files and I need to extract specific lines from each file (the same line numbers in each file) and place them in a single Excel document. ie, Line 277 of every xml file in to column 'A' of excel worksheet & line 269 of every xml into column 'B' of excel sheet.
Is this possible and, if so, how do i do it please?
To do this manually would take me several weeks. There are @20 lines i need to copy from each file and several thousand XML files.
If it isn't doable using the above method, is there another solution someone could recommend please.
Huge thanks
I have seen the question on here about detecting specific text within a line (and copying that' but this wouldn't apply in my case as the content of each line (eg Line337) will not always be the same.
I've (hopefully) attached a screenshot of a section of 1 XML. THe only reference to the data i need is in the Line number. ie Line182 will always be 'filename' with the actual filename shown in Line183 (this changes depending on the XML file. Apply this also to Line230 'Creation Date' which stays the same but the actual creation date is on Line231.
I've also attached a excel screenshot of, ideally, how the data needs to be displayed. Hopefully this helps
Upvotes: -1
Views: 136
Reputation: 163625
It looks to me as if the entries can be identified by property name rather than by line number, so a conventional XPath query such as //row[column[1]='filename']/column[2]
should work just fine.
In XPath 3.1 I would do something like
collection('file:///c:/temp/dir?select=*.xml')/
array{for $prop in ('filename', 'nb_streams', 'nb_progress', ...)
return [string(//row[column[1]=$prop]/column[2])}
giving you one array for each row of the desired output, which you can then format into tab-separated values or whatever as required.
Upvotes: 0
Reputation: 34433
Try a simple powershell script
using assembly System.Xml.Linq
$filenames = Get-ChildItem -Path 'c:\temp\test*.xml'
$csvFilename = 'c:\temp\test.csv'
$table = [System.Collections.ArrayList]::new()
foreach($filename in $filenames)
{
$xDoc = [System.Xml.Linq.XDocument]::Load($filename)
$newRow = [PSCustomObject]@{}
$newRow | Add-Member -NotePropertyName Filename -NotePropertyValue $filename
$rows = $xDoc.Descendants('row')
foreach($row in $rows)
{
$elements = @($row.Elements('column'))
$newRow | Add-Member -NotePropertyName $elements[0].Value -NotePropertyValue $elements[1].Value
}
$table.Add($newRow) | out-null
}
$table
$table | Export-CSV -Path $csvFilename -NoTypeInformation
Method 2 which skips the XML ident line when it is not UTF-8
using assembly System.Xml.Linq
$filenames = Get-ChildItem -Path 'c:\temp\test*.xml'
$csvFilename = 'c:\temp\test.csv'
$table = [System.Collections.ArrayList]::new()
foreach($filename in $filenames)
{
$reader = [System.IO.StreamReader]::new($filename)
$reader.ReadLine()
$xDoc = [System.Xml.Linq.XDocument]::Load($reader)
$newRow = [PSCustomObject]@{}
$newRow | Add-Member -NotePropertyName Filename -NotePropertyValue $filename
$rows = $xDoc.Descendants('row')
foreach($row in $rows)
{
$elements = @($row.Elements('column'))
$newRow | Add-Member -NotePropertyName $elements[0].Value -NotePropertyValue $elements[1].Value
}
$table.Add($newRow) | out-null
}
$table
$table | Export-CSV -Path $csvFilename -NoTypeInformation
Upvotes: 0