Saltigué
Saltigué

Reputation: 59

parsing XML from a string into an element

I have had some troubles to parse a XML from a string directly into an Element. I a have an xml file that I have transform into a string:

resp = requests.post(request_url, request_string,   proxies=urllib.getproxies(), stream=True)

And as recommended here: https://stackoverflow.com/a/25023776/1551810, I used the contenet instead of the text:

response_tree = ET.fromstring(resp.content)

I apparently have a Syntax erro in the XML file :

XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xB0 0x20 0x4E 0x6F, line 12, column 35

I tried this to encode the content but to no avail:

ET.fromstring(resp.content.encode('utf8'))

I have the same XMLSYntaxError than before. Can anyone help me? I already have spent two hours on this.

Upvotes: 0

Views: 112

Answers (1)

Saltigué
Saltigué

Reputation: 59

I finally found a great library that helped me to solve the problem: cchardet(https://pypi.python.org/pypi/cchardet/0.3.5) And I followed @deets advice.

import cchardet
charac_coding_desired = 'UTF-8'
encoding = cchardet.detect(resp.content)['encoding']
if charac_coding_desired != encoding:
    strg= resp.content.decode(encoding, resp.content).encode(charac_coding_desired)

Now I can parse brutally the string:

ET.fromstring(strg)

Thanks anyway!!!

Upvotes: 1

Related Questions