gokhan
gokhan

Reputation: 39

Python unicode file writing

I'm using twitter python library to fetch some tweets from a public stream. The library fetches tweets in json format and converts them to python structures. What I'm trying to do is to directly get the json string and write it to a file. Inside the twitter library it first reads a network socket and applies .decode('utf8') to the buffer. Then, it wraps the info in a python structure and returns it. I can use jsonEncoder to encode it back to the json string and save it to a file. But there is a problem with character encoding I guess. When I try to print the json string it prints fine in the console. But when I try to write it into a file, some characters appear such as \u0627\u0644\u0644\u06be\u064f

I tried to open the saved file using different encodings and nothing has changed. It suppose to be in utf8 encoding and when I try to display it, those special characters should be replaced with actual characters they represent. Am I missing something here? How can I achieve this?

more info:

I'm using python 2.7

I open the file like this:

json_file = open('test.json', 'w')

I also tried this:

json_file = codecs.open( 'test.json', 'w', 'utf-8' )

nothing has changed. I blindly tried, .encode('utf8'), .decode('utf8') on the json string and the result is the same. I tried different text editors to view the written text, I used cat command to see the text in the console and those characters which start with \u still appear.

Update:

I solved the problem. jsonEncoder has an option ensure_ascii

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

I made it False and the problem has gone away.

Upvotes: 2

Views: 1008

Answers (2)

gokhan
gokhan

Reputation: 39

jsonEncoder has an option ensure_ascii

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

Make it False and the problem will go away.

Upvotes: 2

Jim DeLaHunt
Jim DeLaHunt

Reputation: 11395

Well, since you won't post your solution as an answer, I will. This question should not be left showing no answer.

jsonEncoder has an option ensure_ascii.

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

Make it False and the problem will go away.

Upvotes: 0

Related Questions