Error with Python and XML

Question

I'm getting an error when trying to grab a value from my XML. I get "Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration."

Here is my code:

import requests
import lxml.etree
from requests.auth import HTTPBasicAuth

r= requests.get("https://somelinkhere/folder/?parameter=abc", auth=HTTPBasicAuth('username', 'password'))
print r.text

root = lxml.etree.fromstring(r.text)
textelem = root.find("opensearch:totalResults")
print textelem.text

I'm getting this error:

Traceback (most recent call last):
  File "tickets2.py", line 8, in 
    root = lxml.etree.fromstring(r.text)
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:82934)
  File "src/lxml/parser.pxi", line 1814, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:124471)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Here is what the XML looks like, where I'm trying to grab the file in the last line.


  Feed from some link here
  
  
  https://somelinkhere/folder/?parameter=abc
  2018-03-06T17:48:09Z
  company.com
  2018-03-06T17:48:09Z
  4

I have tried various changes from links like https://twigstechtips.blogspot.com/2013/06/python-lxml-strings-with-encoding.html and http://makble.com/how-to-parse-xml-with-python-and-lxml but I keep running into the same error.

Daniel Haley · Accepted Answer

Instead of r.text, which guesses at the text encoding and decodes it, try using r.content which accesses the response body as bytes. (See http://docs.python-requests.org/en/latest/user/quickstart/#response-content.)

You could also use r.raw. See parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml) for more info.

Once that issue is fixed, you'll have the issue of the namespace. The element you're trying to find (opensearch:totalResults) has the prefix opensearch which is bound to the uri http://a9.com/-/spec/opensearch/1.1/.

You can find the element by combining the namespace uri and the local name (Clark notation):

{http://a9.com/-/spec/opensearch/1.1/}totalResults

See http://lxml.de/tutorial.html#namespaces for more info.

Here's an example with both changes implemented:

os = "{http://a9.com/-/spec/opensearch/1.1/}"

root = lxml.etree.fromstring(r.content)
textelem = root.find(os + "totalResults")
print textelem.text

Error with Python and XML

Answers (1)

Related Questions