Reputation: 419
I have about 50 GB of 6,000 JSON files which I am currently loading into a pandas dataframe using the following method. (the format_pandas function sets up my pandas data frame while reading each JSON row):
path = '/Users/shabina.rayan/Desktop/Jupyter/Scandanavia Weather/Player Data'
records = []
for filename in glob.glob(os.path.join(path, '*.JSON')):
file = Path(filename)
with open(file) as json_data:
j = json.load(json_data)
format_pandas(j)
pandas_json = json.dumps(records)
df = pd.read_json(pandas_json,orient="records")
As can be guessed, this takes an excruciatingly long time to process my data. Does anyone have any suggestions on any other way I can process 50 GB of JSON files and visualize/analyze it?
Upvotes: 1
Views: 720