scott_liv01
scott_liv01

Reputation: 25

How to parse multiple xml files and pass through attributes into csv?

I need to parse a directory of xml files into one large csv file, I need certain attributes under the element 'Param' (attributes are 'Name' and 'PNum'). There is another XML file in the directory called Content.xml which I can get all the names of the other XML files and set them as the FileName. The issue is that I cannot figure out how to get these attributes in each XML file as each XML file has a different organisation and some don't seem to have these attributes in the first place.

I have written code that works for one of the XML files in the directory that outputs a CSV file with all the relevant information.

import xml.etree.ElementTree as ET
import csv
import os

FileName = '------.xml'
tree = ET.parse(FileName)
root = tree.getroot()[4]

csv_out = open('CsvOut', 'w')

csvwriter = csv.writer(csv_out)

count = 0
for child in root:
    generation = []
    parameters = []
    if count == 0:
        csv_head = ['Generation', 'Parameter Name', 'Parameter Number']
        csvwriter.writerow(csv_head)
        count = count + 1

    gen = FileName[:-4]
    generation.append(gen)
    parameters.append(generation)
    name = child.get('Name')
    parameters.append(name)
    num = child.get('PNum')
    parameters.append(num)
    csvwriter.writerow(parameters)



csv_out.close()

Upvotes: 2

Views: 583

Answers (1)

Anwarvic
Anwarvic

Reputation: 12992

I rather simple and you can do it in two steps:

  • First, enumerate all xml files in the directory
  • Perform your code over these files
import xml.etree.ElementTree as ET
import csv
import os
from glob import glob

# create csv writer
csv_out = open('CsvOut', 'w')
csvwriter = csv.writer(csv_out)
# write the header
csv_head = ['Generation', 'Parameter Name', 'Parameter Number']
csvwriter.writerow(csv_head)

# iterate over the xml files in the current directory
for FileName in glob("*.xml"):
    tree = ET.parse(FileName)
    root = tree.getroot()[4]
    for child in root:
        generation = []
        parameters = []

        gen = FileName[:-4]
        generation.append(gen)
        parameters.append(generation)
        name = child.get('Name')
        parameters.append(name)
        num = child.get('PNum')
        parameters.append(num)
        csvwriter.writerow(parameters)

# after iterating, close the csv file
csv_out.close()

Upvotes: 1

Related Questions