Reputation:
How can we convert this to dataframes? I have tried multiple ways on how it can be achived, i have tried with json file on w3school but it is working correctly, i am new with python, any recommendations on this? Json format is
[
{
"id": 14256,
"city": {
"id": {
"$numberLong": "14256"
},
"name": "Azadshahr",
"findname": "AZADSHAHR",
"country": "IR",
"coord": {
"lon": 48.570728,
"lat": 34.790878
},
"zoom": {
"$numberLong": "10"
}
}
},
{
"id": {
"$numberLong": "465726"
},
"city": {
"id": {
"$numberLong": "465726"
},
"name": "Zadonsk",
"findname": "ZADONSK",
"country": "RU",
"coord": {
"lon": 38.926102,
"lat": 52.3904
},
"zoom": {
"$numberLong": "16"
}
}
}
]
The expected output is :
it tried to do a conversion but i am receiving error and it is not the whole data
with open('data/history.city.list.json') as f:
data = json.load(f)
but not able to load as data, This is what i have tried but i feel
_id = []
country = []
coord_lat = []
coord_lon = []
counter = 0
for i in data:
_id.append(data[counter]['id'])
country.append(data[counter]['city']['country'])
coord_lat.append(data[counter]['city']['coord']['lon'])
coord_lat.append(data[counter]['city']['coord']['lat'])
counter += 1
When i have tried to print it as a dataframe
df = pd.DataFrame({'Longtitude' : coord_lat , 'Latitude' : coord_lat})
df.head(10)
This was able to set it to dataframe, but as soon as i add 'Country' to pd.dataframe() , it will return as ValueError: arrays must all be same length.
i understand that country column does not match the other columns but can we achieve this and is there a simpler way to do it ?
Upvotes: 2
Views: 250
Reputation: 3929
You can use json_normalize() as described here:
import pandas as pd
d = [
{
"id": 14256,
"city": {
"id": {
"$numberLong": "14256"
},
"name": "Azadshahr",
"findname": "AZADSHAHR",
"country": "IR",
"coord": {
"lon": 48.570728,
"lat": 34.790878
},
"zoom": {
"$numberLong": "10"
}
}
},
{
"id": {
"$numberLong": "465726"
},
"city": {
"id": {
"$numberLong": "465726"
},
"name": "Zadonsk",
"findname": "ZADONSK",
"country": "RU",
"coord": {
"lon": 38.926102,
"lat": 52.3904
},
"zoom": {
"$numberLong": "16"
}
}
}
]
pd.io.json.json_normalize(d)
Output:
id city.id.$numberLong city.name city.findname city.country city.coord.lon city.coord.lat city.zoom.$numberLong id.$numberLong
0 14256.0 14256 Azadshahr AZADSHAHR IR 48.570728 34.790878 10 NaN
1 NaN 465726 Zadonsk ZADONSK RU 38.926102 52.390400 16 465726
The column names do not match your expected output, but you can change that easily with df.columns = ['Id', 'city', ... 'Zoom']
Upvotes: 0