Mike Zoucha
Mike Zoucha

Reputation: 83

Accessing JSON elements with Python and Pandas

I have a few questions about JSON in Pandas! I have gotten the data loaded into a dataframe and can search effectively with the query code below. Three questions I have:

  1. How can I get the individual pieces of 'coord' (i.e. 'coord.lon')
  2. How do I assign specific columns to variables in the code? (i.e. state = df.loc[df['name'] == city, ['state'])?
  3. How can I add additional search terms? (i.e. df['name'] == city AND df['state'] == state)

Here is the code I have so far to pull the records I need:

query = df.loc[df['name'] == city, ['id', 'name', 'state', 'coord']]

Here is a sample of the JSON response I am trying to search:

{
    "id": 5074329,
    "name": "Oakland",
    "state": "NE",
    "country": "US",
    "coord": {
        "lon": -96.466972,
        "lat": 41.835831
    }
},
{
    "id": 5074472,
    "name": "Omaha",
    "state": "NE",
    "country": "US",
    "coord": {
        "lon": -95.93779,
        "lat": 41.25861
    }
}

As always, THANKS!!

edit: Here is a picture of a couple rows of the dataframe:

url = 'http://bulk.openweathermap.org/sample/city.list.json.gz'

df = pd.read_json(url)

enter image description here

Upvotes: 2

Views: 2081

Answers (1)

Nk03
Nk03

Reputation: 14949

If you have the dataframe like this:

        id     name state country                                  coord
0  5074329  Oakland    NE      US  {'lon': -96.466972, 'lat': 41.835831}
1  5074472    Omaha    NE      US    {'lon': -95.93779, 'lat': 41.25861}
  1. To access lon you can use the str method:
df['lon'] = df.coord.str['lon'] 

#output
        id     name state country                                  coord  \
0  5074329  Oakland    NE      US  {'lon': -96.466972, 'lat': 41.835831}   
1  5074472    Omaha    NE      US    {'lon': -95.93779, 'lat': 41.25861}   

         lon  
0 -96.466972  
1 -95.937790 
  1. I guess you wanna do boolean indexing here:
city =  'Omaha'
state = df.loc[df['name'] == city]['state']

The above code will return the pandas series. If you want only the 1st value use iloc:

city =  'Omaha'
state = df.loc[df['name'] == city]['state'].iloc[0]

# output
'NE'
  1. This is quite similar to the 2nd part:
city = 'Omaha'
state = 'NE'
suset_df = df.loc[(df['name'] == city) & (df['state'] == state)]

# OUTPUT
        id   name state country                                coord       lon
1  5074472  Omaha    NE      US  {'lon': -95.93779, 'lat': 41.25861} -95.93779

Upvotes: 3

Related Questions