Sherlock
Sherlock

Reputation: 1053

Accessing JSON file using Python , getting "Memory Error"

I am working with JSON dataset( reddit data) and size of data is 5GB. My JSON data block looks like this.

{"subreddit":"languagelearning","parent_id":"t1_cn9nn8v","retrieved_on":1425123427,"ups":1,"author_flair_css_class":"","gilded":0,"author_flair_text":"Lojban (N)","controversiality":0,"subreddit_id":"t5_2rjsc","edited":false,"score_hidden":false,"link_id":"t3_2qulql","name":"t1_cnau2yv","created_utc":"1420074627","downs":0,"body":"I played around with the Japanese Duolingo for awhile and basically if you're not near Fluency you won't learn much of anything.\n\nAs was said below, the only one that really exists is Chineseskill.","id":"cnau2yv","distinguished":null,"archived":false,"author":"Pennwisedom","score":1}

I am using python to list every "subreddit" from this data. But I am getting memory error. Below are my python code and error.

import json
data=json.loads(open('/media/RC_2015-01').read())
for item in data:
   name = item.get("subreddit")
   print name

Traceback (most recent call last): File "name_python.py", line 4, in data=json.loads(open('/media/RC_2015-01').read()) MemoryError

What is know is that , I am trying to load very big data that why I am getting Memory Error. Could anyone suggest any other workaround.

Upvotes: 1

Views: 855

Answers (1)

Alex
Alex

Reputation: 21766

You need to use an iterative parser like ijson to parse to each record at a time rather than loading the entire file into memory.

Regarding your error message, make sure you data is valid JSON and has square bracket around the records. This structure will parse correctly

[
 {...},
 {...}
]

whereas the following structure will raise the 'Additional data' exception

{....}
{....}

Upvotes: 1

Related Questions