Howell Yu
Howell Yu

Reputation: 73

Alternative Way to Load Large Json File

I am trying to load a large json file (around 4G) as a pandas dataframe, but the following method does not work for file > around 2G. Is there any alternative method?

data_dir = 'data.json' my_data = pd.read_json(data_dir, lines = True)

I tried ijson but have no idea how to covert it to a dataframe.

Upvotes: 2

Views: 2145

Answers (1)

software.wikipedia
software.wikipedia

Reputation: 719

Loading the Large document in memory may not be best approach in this case. This size of JSON may require you to use a different approach for parsing. Try using Streaming parsers instead. Some options

https://pypi.org/project/json-stream-parser/

https://pypi.org/project/ijson/

The key is to not load the entire document in memory. This is similar to SAX parsing in the XML world.

I am not a python expert, however, there should be a good library that can already do it for you.

Upvotes: 1

Related Questions