Alienpenguin
Alienpenguin

Reputation: 1067

getting accented chars from html with python

In a webpage source i can see a word like: abac%c3%a0 that the browser (chrome) shows as abacà.
Now, i have downloaded the page using urllib2 and i am parsing the page source with python (2.7 on mac os x) to get some keywords: i would like to have the accented character instead of the %c3%a0 but using str.decode("utf8") did not work (i tried that since those seemed like the \xc3\xa0 utf8 codes).

What should i try to add the accented word within a dictionary?

By the way the html page have no indication of the encoding whatsoever in the source

thanks

Upvotes: 0

Views: 30

Answers (1)

otus
otus

Reputation: 5732

The characters have been URL-encoded (are they part of a URL?), which you can undo using urllib.unquote.

Upvotes: 1

Related Questions