Lupanoide
Lupanoide

Reputation: 3222

Encoding data in utf-8

For a project I have to download data of many cities in the world, so with several special characters or accent, but I can't visualize it well.

I have tried to encode it with utf-8 but without luck: I do know why but I haven't errors from terminal but I continue to visualize city name like this one: L'H\u00f4pital Puits II, or this other one Marsza\u0142kowska, Warszawa.

Can someone help pinpoint the error, or what can I try?

import requests

w = open("cittadine.txt","wb")

fullMap = requests.get("http://aqicn.org/map/world/").text
print type(fullMap) # <type 'unicode'>
fullMap = fullMap.encode("utf-8")
w.writelines(fullMap)

Upvotes: 0

Views: 118

Answers (1)

Alastair McCormack
Alastair McCormack

Reputation: 27744

Your code is ok. The reason you're getting {L'H\u00f4pital Puits II}}, is because the server is sending that exact string!

curl "http://aqicn.org/map/world/" | grep -o "L'H\\\\u00f4pital Puits II"
L'H\u00f4pital Puits II

That string appears in a block of JSON, so you need to find that block, then use the JSON module to decode it, which should convert this Unicode point back to a proper character.

Beautiful Soup is probably the best way to find the JSON block.

Suggestion

A neater way to write UTF-8 to a file is to use an encoding TextWrapper, which will automatically encode Unicode chars on write:

import requests
import io

w = io.open("cittadine.txt","w", encoding="utf-8")

fullMap = requests.get("http://aqicn.org/map/world/").text
print type(fullMap) # <type 'unicode'>
w.write(fullmap)

If you need to write Unicode to a Windows terminal, install https://github.com/Drekin/win-unicode-console

Upvotes: 1

Related Questions