Reputation: 73
I am trying to load a large json
file (around 4G) as a pandas dataframe
, but the following method does not work for file > around 2G. Is there any alternative method?
data_dir = 'data.json'
my_data = pd.read_json(data_dir, lines = True)
I tried ijson
but have no idea how to covert it to a dataframe
.
Upvotes: 2
Views: 2145
Reputation: 719
Loading the Large document in memory may not be best approach in this case. This size of JSON may require you to use a different approach for parsing. Try using Streaming parsers instead. Some options
https://pypi.org/project/json-stream-parser/
https://pypi.org/project/ijson/
The key is to not load the entire document in memory. This is similar to SAX parsing in the XML world.
I am not a python expert, however, there should be a good library that can already do it for you.
Upvotes: 1