Reputation: 11
I have a JSON file that is about 8GB in size. When I try to convert the file using this script:
import csv
import json
infile = open("filename.json","r")
outfile = open("data.csv","w")
writer = csv.writer(outfile)
for row in json.loads(infile.read()):
writer.write(row)
I get this error:
Traceback (most recent call last):
File "E:/Thesis/DataDownload/PTDataDownload/demo.py", line 9, in <module>
for row in json.loads(infile.read()):
MemoryError
I'm sure this has to do with the size of the file. Is there a way to ensure the file will convert to a CSV without the error?
This is a sample of my JSON code:
{"id":"tag:search.twitter.com,2005:905943958144118786","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:899030045234167808","link":"http://www.twitter.com/NAJajsjs3","displayName":"NAJajsjs","postedTime":"2017-08-19T22:07:20.000Z","image":"https://pbs.twimg.com/profile_images/905943685493391360/2ZavxLrD_normal.jpg","summary":null,"links":[{"href":null,"rel":"me"}],"friendsCount":23,"followersCount":1,"listedCount":0,"statusesCount":283,"twitterTimeZone":null,"verified":false,"utcOffset":null,"preferredUsername":"NAJajsjs3","languages":["tr"],"favoritesCount":106},"verb":"post","postedTime":"2017-09-08T00:00:45.000Z","generator":{"displayName":"Twitter for iPhone","link":"http://twitter.com/download/iphone"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/NAJajsjs3/statuses/905943958144118786","body":"@thugIyfe Beyonce do better","object":{"objectType":"note","id":"object:search.twitter.com,2005:905943958144118786","summary":"@thugIyfe Beyonce do better","link":"http://twitter.com/NAJajsjs3/statuses/905943958144118786","postedTime":"2017-09-08T00:00:45.000Z"},"inReplyTo":{"link":"http://twitter.com/thugIyfe/statuses/905942854710775808"},"favoritesCount":0,"twitter_entities":{"hashtags":[],"user_mentions":[{"screen_name":"thugIyfe","name":"dari.","id":40542633,"id_str":"40542633","indices":[0,9]}],"symbols":[],"urls":[]},"twitter_filter_level":"low","twitter_lang":"en","display_text_range":[10,27],"retweetCount":0,"gnip":{"matching_rules":[{"tag":null,"id":6134817834619900217,"id_str":"6134817834619900217"}]}}
(sorry for the ugly formatting)
An alternative may be that I have about 8000 smaller json files that I combined to make this file. They are each within their own folder with just the single json in the folder. Would it be easier to convert each of these individually and then combine them into one csv?
The reason I am asking this is because I have very basic python knowledge and all the answers to similar questions that I have found are way more complicated than I can understand. Please help this new python user to read this json as a csv!
Upvotes: 1
Views: 7702
Reputation: 101
Yes, it is absolutely can be done in a very easy way. I opened a 4GB json file in a few seconds. For me, I dont need to convert to csv. But it can be done in a very easy way.
run mongoimport command
docker exec -it container_id mongoimport --db test --collection data --file /tmp/data.json --jsonArray
run the mongo export command to export to csv
docker exec -it container_id mongoexport --db test --collection data --csv --out data.csv --fields id,objectType
Upvotes: 0
Reputation: 191681
Would it be easier to convert each of these individually and then combine them into one csv?
Yes, it certainly would
For example, this will put each JSON object/array (whatever is loaded from the file) onto its own line of a single CSV.
import json, csv
from glob import glob
with open('out.csv', 'w') as f:
for fname in glob("*.json"): # Reads all json from the current directory
with open(fname) as j:
f.write(str(json.load(j)))
f.write('\n')
Use glob pattern **/*.json
to find all json files in nested folders
Not really clear what for row in ...
was doing for your data since you don't have an array. Unless you wanted each JSON key to be a CSV column?
Upvotes: 1