rayimpr
rayimpr

Reputation: 95

How to write dict into file with characters other than English letters in python 2.7.8?

Here is the simple example:

test = {'location': '北京', 'country': '中国'}  # the values are Chinese.  

In file test.log:

{'location': '北京', 'country': '中国'} 

In python 2.7.8, when I need to output data, I use str() method.

file_out = open('test.log', 'w')
file_out.write(str(test))
file_out.close()

str() method does not work when dict contains other characters. I know in python2 the default is ASCII, and this does not support Chinese.

My questions is that how can I output dict into files? Someone mentioned Json package for me, but I do not how to use.

Upvotes: 2

Views: 2560

Answers (2)

tripleee
tripleee

Reputation: 189739

The code which populates this structure should produce Unicode strings (Python 2 u"..." strings), not byte strings (Python 2 "..." strings). See http://nedbatchelder.com/text/unipain.html for a good introduction to the pertinent differences between these two data types.

Building on (an earlier version of) m170897017's answer;

#!/usr/bin/python
# -*- coding: utf-8 -*-

import json
test = {u'location': u'北京', u'country': u'中国'}
my_json = json.dumps(test, ensure_ascii=False).encode('utf8')
print my_json

If you have code which programmatically populates the location field, make it populate it with a Unicode string. For example, if you read UTF-8 data from somewhere, decode() it before putting it there.

def update_location ():
    location = '北京'
    return location.decode('utf-8')

test['location'] = update_location()

You could use other serialization formats besides JSON, including the str() representation of the Python structure, but JSON is standard, well-defined, and well-documented. It requires all strings to be UTF-8, so it works trivially for non-English strings.

Python2 works internally with either byte strings or Unicode strings, but in this scenario, Unicode strings should be emphatically recommended, and will be the only sensible choice if/when you move to Python3. Convert everything to Unicode as soon as you can and convert (back?) to an external representation (e.g. UTF-8) only when you have to.

Upvotes: 1

Stephen Lin
Stephen Lin

Reputation: 4912

Here is what you want.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import json
ori_test = {'location': '北京', 'country': '中国'}
test = dict([(unicode(k, "utf-8"), unicode(v, "utf-8")) for k, v in ori_test.items()])

my_dict = json.dumps(test, ensure_ascii=False).encode('utf8')
print my_dict
# then write my_dict to the local file as you want

And this link could be helpful for you.

Upvotes: 2

Related Questions