Mahran
Mahran

Reputation: 157

Json and non-English languages

I'm new to Python and trying hard to learn it.

I was trying to save tweets by using Tweepy and because my query is in "Arabic" the results seems to be strange like this:

"created_at": "Mon Jun 12 15:12:50 +0000 2017", "id": 874283356158033920, "id_str": "874283356158033920", "text": "\\u0637\\u0627\\u0644\\u0628\\u0629 \\u062c\\u0633\\u0645\\u0647\\u0627 \\u062c\\u0628\\u0627\\u0631 \\u062a\\u062a\\u062e\\u062f \\u0645\\u0646 \\u0627\\u0644\\u0634\\u0627\\u0631\\u0639 \\u0648 \\u062a\\u062a\\u0646\\u0627\\u0643..\\n\\n\\u0633\\u0643\\u0633_\\u0627\\u062c\\u0646\\u0628\\u064a\\n\\u0645\\u0642\\u0627\\u0637\\u0639_\\u0633\\u0643\\u0633\\nbabes\\n2236 ", "truncated": false, "entities"

I tried many times and saw many similar questions here but couldn't find the answer. Does Json support the Arabic language?

here is my code:

import tweepy
import json
from pprint import pprint
import time
auth = tweepy.OAuthHandler("", "")
auth.set_access_token("", "")
api = tweepy.API(auth)
max_tweets=100
query='الشارع'
searched_tweets = [status._json for status in tweepy.Cursor(api.search,  q=query).items(max_tweets)]
json_strings = [json.dumps(json_obj) for json_obj in searched_tweets]  
print(json_strings)

I'm using Python3

Upvotes: 4

Views: 3258

Answers (1)

Alastair McCormack
Alastair McCormack

Reputation: 27714

The problem is that by default json.dumps() encodes any non ASCII characters using escaped Unicode notation, which optional in the JSON specification. By passing ensure_ascii=False to dumps(), this will disable this feature.

The second problem you'll have once you fixed the main problem, is you'll try to print list. Python will the print a representation of the list, including representations of the data inside it. This means that the data includes literals and a safe way to print data.

For strings, this means that the object is printed with quotes and any non-ascii characters are printed as Unicode escape sequences.

Try:

json_strings = [json.dumps(json_obj, ensure_ascii=False) for json_obj in searched_tweets]  
for tweet in json_strings:
    print(tweet)

Upvotes: 5

Related Questions