Leighski
Leighski

Reputation: 1

Is there a way to copy lines (ie Line337) from BBedit to Excel (automated) from a number of files

I have a large amount of XML files and I need to extract specific lines from each file (the same line numbers in each file) and place them in a single Excel document. ie, Line 277 of every xml file in to column 'A' of excel worksheet & line 269 of every xml into column 'B' of excel sheet.

Is this possible and, if so, how do i do it please?

To do this manually would take me several weeks. There are @20 lines i need to copy from each file and several thousand XML files.

If it isn't doable using the above method, is there another solution someone could recommend please.

Huge thanks

I have seen the question on here about detecting specific text within a line (and copying that' but this wouldn't apply in my case as the content of each line (eg Line337) will not always be the same.

I've (hopefully) attached a screenshot of a section of 1 XML. THe only reference to the data i need is in the Line number. ie Line182 will always be 'filename' with the actual filename shown in Line183 (this changes depending on the XML file. Apply this also to Line230 'Creation Date' which stays the same but the actual creation date is on Line231.

I've also attached a excel screenshot of, ideally, how the data needs to be displayed. Hopefully this helps

XML screenshot

Excel screenshot

Upvotes: -1

Views: 136

Answers (2)

Michael Kay
Michael Kay

Reputation: 163625

It looks to me as if the entries can be identified by property name rather than by line number, so a conventional XPath query such as //row[column[1]='filename']/column[2] should work just fine.

In XPath 3.1 I would do something like

collection('file:///c:/temp/dir?select=*.xml')/
  array{for $prop in ('filename', 'nb_streams', 'nb_progress', ...)
        return [string(//row[column[1]=$prop]/column[2])}

giving you one array for each row of the desired output, which you can then format into tab-separated values or whatever as required.

Upvotes: 0

jdweng
jdweng

Reputation: 34433

Try a simple powershell script

using assembly System.Xml.Linq 

$filenames = Get-ChildItem -Path 'c:\temp\test*.xml'
$csvFilename = 'c:\temp\test.csv'

$table = [System.Collections.ArrayList]::new()
foreach($filename in $filenames)
{
   $xDoc = [System.Xml.Linq.XDocument]::Load($filename)
   $newRow = [PSCustomObject]@{}
   $newRow | Add-Member -NotePropertyName Filename -NotePropertyValue $filename
   $rows = $xDoc.Descendants('row')
   foreach($row in $rows)
   {
      $elements = @($row.Elements('column'))
      $newRow | Add-Member -NotePropertyName $elements[0].Value -NotePropertyValue $elements[1].Value
   }
   $table.Add($newRow) | out-null
}
$table
$table | Export-CSV -Path $csvFilename -NoTypeInformation

Method 2 which skips the XML ident line when it is not UTF-8

using assembly System.Xml.Linq 

$filenames = Get-ChildItem -Path 'c:\temp\test*.xml'
$csvFilename = 'c:\temp\test.csv'

$table = [System.Collections.ArrayList]::new()
foreach($filename in $filenames)
{
   $reader = [System.IO.StreamReader]::new($filename)
   $reader.ReadLine()
   $xDoc = [System.Xml.Linq.XDocument]::Load($reader)
   $newRow = [PSCustomObject]@{}
   $newRow | Add-Member -NotePropertyName Filename -NotePropertyValue $filename
   $rows = $xDoc.Descendants('row')
   foreach($row in $rows)
   {
      $elements = @($row.Elements('column'))
      $newRow | Add-Member -NotePropertyName $elements[0].Value -NotePropertyValue $elements[1].Value
   }
   $table.Add($newRow) | out-null
}
$table
$table | Export-CSV -Path $csvFilename -NoTypeInformation

Upvotes: 0

Related Questions