Pooya
Pooya

Reputation: 4481

Exception when parsing a xml using lxml

I wrote this code to validate my xml file via a xsd

def parseAndObjectifyXml(xmlPath, xsdPath):
    from lxml import  etree

    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    myxml = etree.parse(xmlinput) # In this line xml input is empty
    schema.assertValid(myxml)

but when I want to validate it, my xmlinput is empty but my xmlContent is not empty. what is the problem?

Upvotes: 0

Views: 809

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1124788

Files in python have a "current position"; it starts at the beginning of the file (position 0), then, as you read the file, the current position pointer moves along until it reaches the end.

You'll need to put that pointer back to the beginning before the lxml parser can read the contents in full. Use the .seek() method for that:

from lxml import  etree

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    xmlinput.seek(0)
    myxml = etree.parse(xmlinput)
    schema.assertValid(myxml)

You only need to do this if you need xmlContent somewhere else too; you could alternatively pass it into the .parse() method if wrapped in a StringIO object to provide the necessary file object methods:

from lxml import  etree
from cStringIO import StringIO

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    myxml = etree.parse(StringIO(xmlContent))
    schema.assertValid(myxml)

If you are not using xmlContent for anything else, then you do not need the extra .read() call either, and subsequently won't have problems parsing it with lxml; just omit the call altogether, and you won't need to move the current position pointer back to the start either:

from lxml import  etree

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    myxml = etree.parse(xmlinput)
    schema.assertValid(myxml)

To learn more about .seek() (and it's counterpart, .tell()), read up on file objects in the Python tutorial.

Upvotes: 2

Simeon Visser
Simeon Visser

Reputation: 122516

You should use the XML content that you have read:

xmlContent = xmlinput.read()
myxml = etree.parse(xmlContent)

instead of:

myxml = etree.parse(xmlinput)

Upvotes: -1

Related Questions