Andrew Rae-Zirn
Andrew Rae-Zirn

Reputation: 3

How to convert a large Json file into a csv using python

(Python 3.5) I am trying to parse a large user review.json file (1.3gb) into python and convert to a .csv file. I have tried looking for a simple converter tool online, most of which accept a file size maximum of 1Mb or are super expensive. as i am fairly new to python i guess i ask 2 questions.

  1. is it even possible/ efficient to do so or should i be looking for another method?

  2. I tried the following code, it only is reading the and writing the top 342 lines in my .json doc then returning an error.

Blockquote File "C:\Anaconda3\lib\json__init__.py", line 319, in loads return _default_decoder.decode(s)

File "C:\Anaconda3\lib\json\decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end) JSONDecodeError: Extra data

This is the code im using

import csv
import json

infile = open("myfile.json","r")
outfile = open ("myfile.csv","w")

writer = csv.writer(outfile)

for row in json.loads(infile.read()):
  writer.writerow(row)

my .json example:

Link To small part of Json

My thoughts is its some type of error related to my for loop, with json.loads... but i do not know enough about it. Is it possible to create a dictionary{} and take convert just the values "user_id", "stars", "text"? or am i dreaming.

Any suggestions or criticism are appreciated.

Upvotes: 0

Views: 7095

Answers (2)

nigel222
nigel222

Reputation: 8192

Sometimes it's not as easy as having one JSON definition per line of input. A JSON definition can spread out over multiple lines, and it's not necessarily easy to determine which are the start and end braces reading line by line (for example, if there are strings containing braces, or nested structures).

The answer is to use the raw_decode method of json.JSONDecoder to fetch the JSON definitions from the file one at a time. This will work for any set of concatenated valid JSON definitions. It's further described in my answer here: Importing wrongly concatenated JSONs in python

Upvotes: 0

Daniel Roseman
Daniel Roseman

Reputation: 599480

This is not a JSON file; this is a file containing individual lines of JSON. You should parse each line individually.

for row in infile:
  data = json.loads(row)
  writer.writerow(data)

Upvotes: 1

Related Questions