Processing 50 GB of JSON into Pandas Dataframe

Question

I have about 50 GB of 6,000 JSON files which I am currently loading into a pandas dataframe using the following method. (the format_pandas function sets up my pandas data frame while reading each JSON row):

path = '/Users/shabina.rayan/Desktop/Jupyter/Scandanavia Weather/Player  Data'
records = []
for filename in glob.glob(os.path.join(path, '*.JSON')):
    file = Path(filename)
    with open(file) as json_data:
        j = json.load(json_data)
        format_pandas(j)
pandas_json = json.dumps(records)
df = pd.read_json(pandas_json,orient="records")

As can be guessed, this takes an excruciatingly long time to process my data. Does anyone have any suggestions on any other way I can process 50 GB of JSON files and visualize/analyze it?

Processing 50 GB of JSON into Pandas Dataframe

Answers (1)

Related Questions