Reputation: 62846
I am using googlemaps Python package to do reverse geocoding. Observe:
PS Z:\dev\poc\SDR> python
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from googlemaps import GoogleMaps
>>> gmaps = GoogleMaps("*** my google API key ***")
>>> d=gmaps.reverse_geocode(51.75,19.46667)
>>> d
{u'Status': {u'code': 200, u'request': u'geocode'}, u'Placemark': [{u'Point': {u'coordinates': [19.466876, 51.7501456, 0]}, u'ExtendedData': {u'LatLonBox': {u'west': 19.465527, u'east': 19.468225, u'n
orth': 51.7514946, u'south': 51.7487966}}, u'AddressDetails': {u'Country': {u'CountryName': u'Polska', u'AdministrativeArea': {u'SubAdministrativeArea': {u'SubAdministrativeAreaName': u'\u0141\xf3d\u0
17a', u'Locality': {u'Thoroughfare': {u'ThoroughfareName': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16'}, u'LocalityName': u'\u0141\xf3d\u017a'}}, u'AdministrativeAreaName': u'\u0142\xf3dzkie'
}, u'CountryNameCode': u'PL'}, u'Accuracy': 8}, u'id': u'p1', u'address': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16, 90-001 \u0141\xf3d\u017a, Poland'}], u'name': u'51.750000,19.466670'}
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent = 2)
>>> pp.pprint(d)
{ u'Placemark': [ { u'AddressDetails': { u'Accuracy': 8,
u'Country': { u'AdministrativeArea': { u'AdministrativeAreaName': u'\u0142\xf3dzkie',
u'SubAdministrativeArea': { u'Locality': { u'LocalityName': u'\u0141\xf3d\u017a',
u'Thoroughfare': { u'ThoroughfareName': u'ksi\u0119dza Biskupa Wincentego Tym
ienieckiego 16'}},
u'SubAdministrativeAreaName': u'\u0141\xf3d\u017a'}},
u'CountryName': u'Polska',
u'CountryNameCode': u'PL'}},
u'ExtendedData': { u'LatLonBox': { u'east': 19.468225,
u'north': 51.7514946,
u'south': 51.7487966,
u'west': 19.465527}},
u'Point': { u'coordinates': [19.466876, 51.7501456, 0]},
u'address': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16, 90-001 \u0141\xf3d\u017a, Poland',
u'id': u'p1'}],
u'Status': { u'code': 200, u'request': u'geocode'},
u'name': u'51.750000,19.466670'}
Now, I want to save the d
dictionary to a file, but I do not want to see u'\u0141\xf3d\u017a'
as the locality name. I want to see Łódź
. Indeed:
So, I have tried this:
with codecs.open("aa.txt", "w", "utf-8") as f:
f.write(unicode(d))
and this:
with codecs.open("aa.txt", "w", "utf-8") as f:
f.write(unicode(str(d), "utf-8"))
and this:
with open("aa.txt", "w") as f:
f.write(unicode(d))
And of course, nothing works. All the trials yield \u0141\xf3d\u017a
. How can I save it correctly?
Upvotes: 3
Views: 4312
Reputation: 41928
The first form is right for writing unicode to the file:
>>> s = u'\u0141\xf3d\u017a'
>>> with codecs.open('aa.txt', 'w', 'utf-8') as f:
... f.write(s)
...
>>> with codecs.open('aa.txt', 'r', 'utf-8') as f:
... print f.read()
...
Łódź
What's happenning is that you're saving the representation for the dictionary when you use unicode(d).
>>> unicode(d)
u"{u'locality': u'\\u0141\\xf3d\\u017a'}"
Which is equivalent to:
>>> unicode(repr(d))
u"{u'locality': u'\\u0141\\xf3d\\u017a'}"
So, you aren't really writing down Łódź to the file. Notice the original escape sequences are escaped. u'\u0141' is the Ł char, but u'\u0141' is a string of 6 chars.
Since Python dictionaries don't have a unicode representation that won't do that escaping, you should use a better serialization method. Using json should be fine if the application that will read the file supports it.
If you really need to write it down to a file readable by some other application that do not support the same serialization method, you have to iterate over the dict and write down the key, value pairs one at a time, not the representation.
Upvotes: 3
Reputation: 1103
A file is a stream of bytes, so your unicode needs to be encoded (represented as bytes) before saved in the file. Now, when opening (reading data from file), you need to decode the data back to unicode, using the same decoding (encoding) scheme, e.g. utf-8
Be careful to write a serialization of your object inside the file, and not the representation of it. Use json.dumps(d) to obtain a serialization and json.loads(filecontent) to read it back
Upvotes: 1
Reputation: 799150
Pass ensure_ascii=False
to json.dump*()
and use codecs.open()
.
Upvotes: 4