Reputation: 1
I'm new and learning Python. I'm working on XML files (5,754 files)in a folder. Working with os module I can print all the files names without a problem. I can parse one file and write to a CSV file without a problem. I'm having a problem trying to parse all the files in the folder. Please help and thank you. code snippets and all code below.
This works fine and prints all 5,754 file names
import os
for path, dirs, files in os.walk(r"C:\Users\dan\Desktop\parse"):
for f in files:
clinical = os.path.join(path, f)
print(clinical)
OUTPUT: C:\Users\dan\Desktop\toparse\ABC0000xxxx\ ABC 00009932.xml C:\Users\dan\Desktop\toparse\ ABC 0000xxxx\ ABC 00009945.xml C:\Users\dan\Desktop\toparse\ ABC 0000xxxx\ ABC 00009958.xml
Working code parse: write to CSV
import csv
import xml.etree.ElementTree as ET
import os
tree = ET.parse("ABC00000102.xml")
root = tree.getroot()
with open('names.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
for child in root.iter():
key = child.tag
value = child.text
writer.writerow([key, value])
my code for the folder prints all files but errors below code
import csv
import xml.etree.ElementTree as ET
import os
with open('names.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
for path, dirs, files in os.walk(r"C:\Users\dan\Desktop\parse"):
for f in files:
clinical = os.path.join(path, f)
print(clinical)
tree = ET.parse("clinical")
root = tree.getroot()
for child in root.iter():
key = child.tag
value = child.text
writer.writerow([key, value])
errors Traceback (most recent call last): File "C:/Users/dan/PycharmProjects/clinicals/example.py", line 14, in tree = ET.parse("clinical") File "C:\Users\dan\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1197, in parse tree.parse(source, parser) File "C:\Users\dan\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 587, in parse source = open(source, "rb") FileNotFoundError: [Errno 2] No such file or directory: 'clinical' Process finished with exit code 1
Upvotes: 0
Views: 887
Reputation: 2299
You're calling ET.parse("clinical")
, which looks for a file named "clinical" in the directory you're currently working in.
If you want it to open the path specificied by the clinical
variable instead, you'll need to do ET.parse(clinical)
instead.
Also, if your intention is to parse every file found, you'll need to fix your indentation to make sure it happens in the for-loop. Currently, your code will only ever parse the last file it finds in the directory, because your parsing happens after the loop.
Upvotes: 1