squidpro
squidpro

Reputation: 1

Parse multiple xml files in a folder

I'm new and learning Python. I'm working on XML files (5,754 files)in a folder. Working with os module I can print all the files names without a problem. I can parse one file and write to a CSV file without a problem. I'm having a problem trying to parse all the files in the folder. Please help and thank you. code snippets and all code below.

This works fine and prints all 5,754 file names

import os
for path, dirs, files in os.walk(r"C:\Users\dan\Desktop\parse"):
    for f in files:
        clinical = os.path.join(path, f)
        print(clinical)

OUTPUT: C:\Users\dan\Desktop\toparse\ABC0000xxxx\ ABC 00009932.xml C:\Users\dan\Desktop\toparse\ ABC 0000xxxx\ ABC 00009945.xml C:\Users\dan\Desktop\toparse\ ABC 0000xxxx\ ABC 00009958.xml

Working code parse: write to CSV

import csv
import xml.etree.ElementTree as ET
import os

tree = ET.parse("ABC00000102.xml")
root = tree.getroot()

with open('names.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)

    for child in root.iter():
        key = child.tag
        value = child.text
        writer.writerow([key, value])

my code for the folder prints all files but errors below code

import csv
import xml.etree.ElementTree as ET
import os

with open('names.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)

for path, dirs, files in os.walk(r"C:\Users\dan\Desktop\parse"):
    for f in files:
        clinical = os.path.join(path, f)
        print(clinical)

tree = ET.parse("clinical")
root = tree.getroot()

for child in root.iter():
    key = child.tag
    value = child.text
    writer.writerow([key, value])

errors Traceback (most recent call last): File "C:/Users/dan/PycharmProjects/clinicals/example.py", line 14, in tree = ET.parse("clinical") File "C:\Users\dan\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1197, in parse tree.parse(source, parser) File "C:\Users\dan\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 587, in parse source = open(source, "rb") FileNotFoundError: [Errno 2] No such file or directory: 'clinical' Process finished with exit code 1

Upvotes: 0

Views: 887

Answers (1)

AmphotericLewisAcid
AmphotericLewisAcid

Reputation: 2299

You're calling ET.parse("clinical"), which looks for a file named "clinical" in the directory you're currently working in.

If you want it to open the path specificied by the clinical variable instead, you'll need to do ET.parse(clinical) instead.

Also, if your intention is to parse every file found, you'll need to fix your indentation to make sure it happens in the for-loop. Currently, your code will only ever parse the last file it finds in the directory, because your parsing happens after the loop.

Upvotes: 1

Related Questions