karthik
karthik

Reputation: 43

Error while parsing lxml

While parsing XML using lxml, I get an error "reading file objects must return bytes objects". Here's the code

from lxml import etree
from io import StringIO
def parseXML(xmlFile):
    """
    parse the xml
    """
    data=open(xmlFile)
    xml=data.read()
    data.close()

    tree=etree.parse(StringIO(xml))
    context=etree.iterparse(StringIO(xml))
    for action, elem in context:
        if not elem.text:
            if not elem.text:
                text="None"
            else:
                text=elem.text
            print(elem.tag + "=>" + text)
if __name__ == "__main__":
    parseXML("C:\\Users\\karthik\Desktop\\xml_path\\bgm.xml")

BGM xml

<?xml version="1.0" ?>
<zAppointments reminder="15">
    <appointment>
        <begin>1181251680</begin>
        <uid>040000008200E000</uid>
        <alarmTime>1181572063</alarmTime>
        <state></state>
        <location></location>
        <duration>1800</duration>
        <subject>Bring pizza home</subject>
    </appointment>
    <appointment>
        <begin>1234360800</begin>
        <duration>1800</duration>
        <subject>Check MS Office website for updates</subject>
        <location></location>
        <uid>604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800</uid>
        <state>dismissed</state>
  </appointment>
</zAppointments>

Error:

Traceback (most recent call last):
  File "C:/Users/karthik/source/ChartAttributes/crecords", line 34, in <module>
    parseXML("C:\\Users\\karthik\\Desktop\\xml_path\\bgm.xml")
  File "C:/Users/karthik/source/ChartAttributes/crecords", line 26, in parseXML
    for action, elem in context:
  File "src\lxml\iterparse.pxi", line 208, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:150010)
  File "src\lxml\iterparse.pxi", line 193, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:149708)
  File "src\lxml\iterparse.pxi", line 221, in lxml.etree.iterparse._read_more_events (src\lxml\lxml.etree.c:150208)
TypeError: reading file objects must return bytes objects

Process finished with exit code 1

Upvotes: 2

Views: 1238

Answers (1)

Toby Speight
Toby Speight

Reputation: 30965

I think you need the XML as a byte array rather than a character string.

Open the file in binary mode to get a bytes object:

    data=open(xmlFile, 'rb')

But it's probably just easier to pass the filename to LXML and let it take care of opening and reading the file:

from lxml import etree

def parseXML(xmlFile):
    for action, elem in etree.iterparse(xmlFile):
        text = elem.text or "None"
        print(elem.tag + "=>" + text)

Upvotes: 2

Related Questions