samkhan13
samkhan13

Reputation: 3385

Reading JSON file using python produces spurious data

I am trying to parse this json file: http://pastebin.com/VcVR0ue0

While using these modules

from pprint import pprint
import codecs
import json

file = 'Desktop10000_760_CurtSacks.json'

I've tried these methods

a)

data = data = json.load(open(file))

b)

data = json.load(codecs.open(file, encoding='utf_8_sig'))

In both cases the output has a u inserted in front of each key-value:

{u'document_tone': {u'tone_categories': [{u'category_id': u'emotion_tone',
                                          u'category_name': u'Emotion Tone',
                                          u'tones': [{u'score': 0.111838,
                                                      u'tone_id': u'anger',
                                                      u'tone_name': u'Anger'},
                                                     {u'score': 0.159831,
                                                      u'tone_id': u'disgust',
                                                      u'tone_name': u'Disgust'},
                                                     {u'score': 0.17082,
                                                      u'tone_id': u'fear',
                                                      u'tone_name': u'Fear'},
                                                     {u'score': 0.507748,
                                                      u'tone_id': u'joy',
                                                      u'tone_name': u'Joy'},
                                                     {u'score': 0.520722,
                                                      u'tone_id': u'sadness',
                                                      u'tone_name': u'Sadness'}]},

How do I read the file correctly?

Upvotes: 0

Views: 51

Answers (2)

Bob Person
Bob Person

Reputation: 81

The 'u' indicates a python unicode string - this is normal. The json library by nature returns unicode strings, so it looks like your data is being parsed properly.

If for whatever reason you don't want unicode strings in your JSON you can use yaml

import yaml
data = yaml.safe_load(open(file))
print( data )

So you'd get

{'key':'item'}

Instead of

{u'key':'item'}

Although I don't see a reason not to use unicode, as for most purposes it won't affect much. (see Python str vs unicode types)

Upvotes: 0

user94559
user94559

Reputation: 60143

It looks like everything's being parsed properly.

Python's syntax for a unicode string is:

u'Here is the string.'

So the Python equivalent of this JSON:

{"foo": "bar"}

is this:

{u'foo': u'bar'}

If you just print out the Python representation of the data, you'll see the Python syntax.

Upvotes: 1

Related Questions