pahnin
pahnin

Reputation: 5588

xml parsing in python

my xml code fetched over network looks like this

<?xml version='1.0' ?><liverequestresponse><liverequesttime>180</liverequesttime><livemessage></livemessage></liverequestresponse>

and my python minidom code is

import urllib, urllib2, time
from xml.dom.minidom import parse
response = urllib2.urlopen(req)
the_page = response.read() 
#print the_page 
dom = parse(response)
name = dom.getElementsByTagNameNS('liverequestresponse')
print name[0].nodeValue

gives some errors

print the_page

works fine

Or if they are any other libraries which are better than minidom, plz tell me.. I would prefer the one which comes pre-installed on linux

UPDATE

errors

Traceback (most recent call last):
  File "logout.py", line 18, in <module>
    dom = parse(response)
  File "/usr/lib64/python2.7/xml/dom/minidom.py", line 1920, in parse
    return expatbuilder.parse(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 928, in parse
    result = builder.parseFile(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 211, in parseFile
    parser.Parse("", True)
xml.parsers.expat.ExpatError: no element found: line 1, column 0

Upvotes: 1

Views: 5860

Answers (2)

mata
mata

Reputation: 69032

if you use response.read before parse(response) you'll already have read the content of the response. a second call to response.read (which parse is doing) will result in an empty string.

The simplest solution is to just drop the first response.read call. But if you really need the response string for some reason, you could try:

import urllib, urllib2, time
import StringIO
from xml.dom.minidom import parse
response = urllib2.urlopen(req)
the_page = response.read() 
#print the_page 
dom = parse(StringIO.StringIO(the_page))
name = dom.getElementsByTagName('liverequesttime')
text = name[0].firstChild
print text.nodeValue

Upvotes: 3

Diego Navarro
Diego Navarro

Reputation: 9704

An approach with lxml, which is being very used in Python lately to parse XML with very good results and performance:

import urllib2
from lxml import etree

with urllib2.urlopen(req) as f:
    xml = etree.parse(f)

xml.find('.//liverequesttime').text

The output of the last line would be: 180

Upvotes: 1

Related Questions