makansij
makansij

Reputation: 9865

How do I decode unicode characters via python?

I am trying to import the following json file using python:

The file is called new_json.json:

{ "nextForwardToken": "f/3208873243596875673623625618474139659", "events": [ { "ingestionTime": 1045619, "timestamp": 1909000, "message": "2 32823453119 eni-889995t1 54.25.64.23 156.43.12.120 3389 23 6 342 24908 143234809 983246 ACCEPT OK" }] }

I have the following code to read the json file, and remove the unicode characters:

JSON_FILE = "new_json.json"
with open(JSON_FILE) as infile:
    print infile
    print '\n type of infile is \n', infile
    data = json.load(infile)
    str_data = str(data)  # convert to string to remove unicode characters
    wo_unicode = str_data.decode('unicode_escape').encode('ascii','ignore')
    print 'unicode characters have been removed \n'
    print wo_unicode

But print wo_unicode still prints with the unicode characters (i.e.u) in it.

The unicode characters cause a problem when trying to treat the json as a dictionary:

for item in data:
    iden = item.get['nextForwardToken']

...results in an error:

AttributeError: 'unicode' object has no attribute 'get'

This has to work in Python2.7. Is there an easy way around this?

Upvotes: 0

Views: 365

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

The error has nothing to do with unicode, you are trying to treat the keys as dicts, just use data to get 'nextForwardToken':

print data.get('nextForwardToken')

When you iterate over data, you are iterating over the keys so 'nextForwardToken'.get('nextForwardToken'), "events".get('nextForwardToken') etc.. are obviously not going to work even with the correct syntax.

Whether you access by data.get(u'nextForwardToken') or data.get('nextForwardToken'), both will return the value for the key:

In [9]: 'nextForwardToken' == u'nextForwardToken'
Out[9]: True
In [10]: data[u'nextForwardToken']
Out[10]: u'f/3208873243596875673623625618474139659'   
In [11]: data['nextForwardToken']
Out[11]: u'f/3208873243596875673623625618474139659'

Upvotes: 1

user4k
user4k

Reputation: 201

This code will give you the values as str without the unicode

import json
JSON_FILE = "/tmp/json.json"
with open(JSON_FILE) as infile:
    print infile
    print '\n type of infile is \n', infile
    data = json.load(infile)
    print data
    str_data = json.dumps(data)
    print str_data

Upvotes: 0

Related Questions