Reputation: 11
I currently have a folder containing more than 100,000 xml files. I wrote a function that parses xml data. This function uses xmltodict package.
def parse_xml(file_path):
with open(file_path,'rb') as f:
dict_data = xmltodict.parse(f.read())
This function works when I copy and paste each individual file's name into the function. I'm trying to write a function that will direct and parse the whole folder without manually typing each name.
for files in os.listdir('/Users/dp/Dropbox/Data/Moody-xbrl'):
parse_xml(files)
Running the code gives me
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xmltodict.py", line 378, in parse
parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 0
Why does this happen and what should I do to resolve it?
Upvotes: 0
Views: 101
Reputation: 1
My problem was that in the folder where I was searching for the XML there were hidden files that were not XML.
Upvotes: 0