Aidan
Aidan

Reputation: 57

Parsing a folder of xml using glob and lxml

I'm having some difficulty trying to parse a folder of valid xml files (*.ditamap) using python 3 and lxml.

The error returned is

"lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1"

my code

import glob
import lxml.etree as et

for file in glob.glob('*.ditamap'):
    with open(file) as xml_file:
        #tree = et.parse("0579182.ditamap")
        tree = et.parse(xml_file)
        print (et.tostring(tree, pretty_print=True))

et.parse works when i pass a filename directly, but not when I pass the file variable.

What am I doing wrong? Seems like there is a some kind of IO error or tpye mismatch but I cannot see what I am doing wrongly...

Upvotes: 0

Views: 1133

Answers (1)

jsmolka
jsmolka

Reputation: 800

et.parse expects a file name but you are giving it an opened file. Try to pass your file variable.

import glob
import lxml.etree as et

for f in glob.glob('*.ditamap'):
    tree = et.parse(f)
    print (et.tostring(tree, pretty_print=True))

You may want to consider using glob.iglob because you are only using it as an iterator.

Edit: Overread that et.parse can accpect file objets. Give it a try nevertheless.

Upvotes: 1

Related Questions