user2805507
user2805507

Reputation:

Converting json to pandas dataframe with weather datasets

How can we convert this to dataframes? I have tried multiple ways on how it can be achived, i have tried with json file on w3school but it is working correctly, i am new with python, any recommendations on this? Json format is

[
  {
    "id": 14256,
    "city": {
      "id": {
        "$numberLong": "14256"
      },
      "name": "Azadshahr",
      "findname": "AZADSHAHR",
      "country": "IR",
      "coord": {
        "lon": 48.570728,
        "lat": 34.790878
      },
      "zoom": {
        "$numberLong": "10"
      }
    }
  },
  {
    "id": {
      "$numberLong": "465726"
    },
    "city": {
      "id": {
        "$numberLong": "465726"
      },
      "name": "Zadonsk",
      "findname": "ZADONSK",
      "country": "RU",
      "coord": {
        "lon": 38.926102,
        "lat": 52.3904
      },
      "zoom": {
        "$numberLong": "16"
      }
    }
  }
]

The expected output is :

enter image description here

it tried to do a conversion but i am receiving error and it is not the whole data

with open('data/history.city.list.json') as f:
    data = json.load(f)

but not able to load as data, This is what i have tried but i feel

_id = []
country = []
coord_lat = []
coord_lon = []

counter = 0
for i in data:
    _id.append(data[counter]['id'])
    country.append(data[counter]['city']['country'])
    coord_lat.append(data[counter]['city']['coord']['lon'])
    coord_lat.append(data[counter]['city']['coord']['lat'])
    counter += 1

When i have tried to print it as a dataframe

df = pd.DataFrame({'Longtitude' : coord_lat , 'Latitude' : coord_lat})    
df.head(10)

This was able to set it to dataframe, but as soon as i add 'Country' to pd.dataframe() , it will return as ValueError: arrays must all be same length.

i understand that country column does not match the other columns but can we achieve this and is there a simpler way to do it ?

Upvotes: 2

Views: 250

Answers (1)

above_c_level
above_c_level

Reputation: 3929

You can use json_normalize() as described here:

import pandas as pd

d = [
  {
    "id": 14256,
    "city": {
      "id": {
        "$numberLong": "14256"
      },
      "name": "Azadshahr",
      "findname": "AZADSHAHR",
      "country": "IR",
      "coord": {
        "lon": 48.570728,
        "lat": 34.790878
      },
      "zoom": {
        "$numberLong": "10"
      }
    }
  },
  {
    "id": {
      "$numberLong": "465726"
    },
    "city": {
      "id": {
        "$numberLong": "465726"
      },
      "name": "Zadonsk",
      "findname": "ZADONSK",
      "country": "RU",
      "coord": {
        "lon": 38.926102,
        "lat": 52.3904
      },
      "zoom": {
        "$numberLong": "16"
      }
    }
  }
]

pd.io.json.json_normalize(d)

Output:

    id          city.id.$numberLong     city.name   city.findname   city.country    city.coord.lon  city.coord.lat  city.zoom.$numberLong   id.$numberLong
0   14256.0     14256                  Azadshahr    AZADSHAHR      IR               48.570728      34.790878       10   NaN
1   NaN         465726                 Zadonsk      ZADONSK        RU               38.926102      52.390400        16  465726

The column names do not match your expected output, but you can change that easily with df.columns = ['Id', 'city', ... 'Zoom']

Upvotes: 0

Related Questions