Reputation: 59
I have had some troubles to parse a XML from a string directly into an Element. I a have an xml file that I have transform into a string:
resp = requests.post(request_url, request_string, proxies=urllib.getproxies(), stream=True)
And as recommended here: https://stackoverflow.com/a/25023776/1551810, I used the contenet instead of the text:
response_tree = ET.fromstring(resp.content)
I apparently have a Syntax erro in the XML file :
XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xB0 0x20 0x4E 0x6F, line 12, column 35
I tried this to encode the content but to no avail:
ET.fromstring(resp.content.encode('utf8'))
I have the same XMLSYntaxError than before. Can anyone help me? I already have spent two hours on this.
Upvotes: 0
Views: 112
Reputation: 59
I finally found a great library that helped me to solve the problem: cchardet(https://pypi.python.org/pypi/cchardet/0.3.5) And I followed @deets advice.
import cchardet
charac_coding_desired = 'UTF-8'
encoding = cchardet.detect(resp.content)['encoding']
if charac_coding_desired != encoding:
strg= resp.content.decode(encoding, resp.content).encode(charac_coding_desired)
Now I can parse brutally the string:
ET.fromstring(strg)
Thanks anyway!!!
Upvotes: 1