Calcutta
Calcutta

Reputation: 1149

Error in loading data in JSON file to Python Pandas dataframe

I have a JSON file with multiple 'records' that I can easily load into a MongoDB database and then extract certain records from MongoDB into a python Pandas Dataframe. This is working just fine. However I wish to avoid this MongoDB route and directly load all the records in the JSON file into a pandas DF. I thought that this would be easy, but somehow it is not working at all.

This is what I have done

import pandas as pd
!wget -O peopleData.json -q https://github.com/prithwis/parashar21/raw/main/data/peopleDataTest5.json
data = pd.read_json('/content/peopleData.json')
#data = pd.read_json('/content/peopleData.json', lines=True)

This is throwing errors. I am using Google Colab and the notebook is available at this link.

I have seen quite a few other questions in stackoverflow that seem to address the same problem, but somehow none of the answers seem to work in my case. Will be grateful if someone can help me fix this.

Upvotes: 0

Views: 160

Answers (1)

Calcutta
Calcutta

Reputation: 1149

Placing a new-line character between two successive json objects solves the problem!

# Retrieve JSON file from Github 
!wget -O peopleData.json -q https://github.com/prithwis/parashar21/raw/main/data/peopleDataTest5.json
!cat peopleData.json
!grep '}{' peopleData.json
!sed -i 's/}{/}\n{/g' peopleData.json
!cat peopleData.json
data = pd.read_json('./peopleData.json', lines=True)
data

Inserted a \n between }{ using sed. Prior to this, the file was one continuous line, now it has 5 separate lines and hence read_json() function works with lines=True option

Upvotes: 1

Related Questions