ddp97
ddp97

Reputation: 11

Why do l get parser error when trying to run the whole folder, but not if I run 1 file at a time?

I currently have a folder containing more than 100,000 xml files. I wrote a function that parses xml data. This function uses xmltodict package.

def parse_xml(file_path):
    with open(file_path,'rb') as f:
        dict_data = xmltodict.parse(f.read())

This function works when I copy and paste each individual file's name into the function. I'm trying to write a function that will direct and parse the whole folder without manually typing each name.

for files in os.listdir('/Users/dp/Dropbox/Data/Moody-xbrl'):
    parse_xml(files)

Running the code gives me

File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xmltodict.py", line 378, in parse
    parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 0

Why does this happen and what should I do to resolve it?

Upvotes: 0

Views: 101

Answers (1)

Jotinha
Jotinha

Reputation: 1

My problem was that in the folder where I was searching for the XML there were hidden files that were not XML.

Upvotes: 0

Related Questions