Reputation: 1067
In a webpage source i can see a word like: abac%c3%a0 that the browser (chrome) shows as abacà.
Now, i have downloaded the page using urllib2 and i am parsing the page source with python (2.7 on mac os x) to get some keywords: i would like to have the accented character instead of the %c3%a0 but using str.decode("utf8") did not work (i tried that since those seemed like the \xc3\xa0 utf8 codes).
What should i try to add the accented word within a dictionary?
By the way the html page have no indication of the encoding whatsoever in the source
thanks
Upvotes: 0
Views: 30
Reputation: 5732
The characters have been URL-encoded (are they part of a URL?), which you can undo using urllib.unquote.
Upvotes: 1