Axis
Axis

Reputation: 2132

JSON to Python Dataframe

I have a weather API JSON data. I got from a website then converted python dictionary

markit_dict = json.loads(response.content)
markit_dict

then I turned a Dataframe

enter image description here

but as you can see the weather column needs to separate 3 different columns

when I choose each column to turn data frame I can

wh = pd.DataFrame(openwet.iloc[1,6])
wh

    description     icon id  main
0   broken clouds   04d 803 Clouds

Last time I tried to put in a for loop to make dataframe but I could not

EDIT:

openwet = pd.DataFrame(markit_dict)
openwet['weather'].values

output :

array([ [{u'main': u'Clouds', u'id': 803, u'icon': u'04d', u'description': u'broken clouds'}],
       [{u'main': u'Clouds', u'id': 803, u'icon': u'04d', u'description': u'broken clouds'}],
       [{u'main': u'Clouds', u'id': 804, u'icon': u'04d', u'description': u'overcast clouds'}],
       [{u'main': u'Clouds', u'id': 804, u'icon': u'04d', u'description': u'overcast clouds'}],

I need to make a Dataframe the weather column. Also I put my json data might be someone can find different way.

url = "http://history.openweathermap.org//storage/debd7a72617dd61b0fc871a2c83fcabf.json"
response = requests.get(url)

response.content

Upvotes: 2

Views: 1040

Answers (2)

jezrael
jezrael

Reputation: 863791

I think you need json_normalize for create DataFrame and column weather is first select lists with str[0], then convert to numpy array by values and then to DataFrame. (If need rename column names is possible add add_prefix) Last concat to original:

import urllib.request, json

url = "http://history.openweathermap.org//storage/debd7a72617dd61b0fc871a2c83fcabf.json"
#http://stackoverflow.com/a/12965254/2901002
with urllib.request.urlopen(url) as url:
    data = json.loads(url.read().decode())

from pandas.io.json import json_normalize    
df = json_normalize(data)
df1 = pd.DataFrame(df['weather'].str[0].values.tolist()).add_prefix('weather.')
print (df1.head())
  weather.description weather.icon  weather.id weather.main
0       broken clouds          04d         803       Clouds
1       broken clouds          04d         803       Clouds
2     overcast clouds          04d         804       Clouds
3     overcast clouds          04d         804       Clouds
4     overcast clouds          04n         804       Clouds

df = pd.concat([df.drop('weather', 1), df1], axis=1)
print (df.head(10))
   city_id  clouds.all          dt                         dt_iso  \
0  2193733          76  1447462800  2015-11-14 01:00:00 +0000 UTC   
1  2193733          76  1447470000  2015-11-14 03:00:00 +0000 UTC   
2  2193733          88  1447477200  2015-11-14 05:00:00 +0000 UTC   
3  2193733          88  1447480800  2015-11-14 06:00:00 +0000 UTC   
4  2193733          88  1447488000  2015-11-14 08:00:00 +0000 UTC   
5  2193733          88  1447491600  2015-11-14 09:00:00 +0000 UTC   
6  2193733          36  1447495200  2015-11-14 10:00:00 +0000 UTC   
7  2193733          36  1447498800  2015-11-14 11:00:00 +0000 UTC   
8  2193733          88  1447506000  2015-11-14 13:00:00 +0000 UTC   
9  2193733          88  1447513200  2015-11-14 15:00:00 +0000 UTC   

   main.humidity  main.pressure  main.temp  main.temp_max  main.temp_min  \
0             52           1020     291.15         291.15         291.15   
1             45           1018     291.15         291.15         291.15   
2             48           1017     290.15         290.15         290.15   
3             55           1017     289.15         289.15         289.15   
4             58           1017     287.15         287.15         287.15   
5             62           1017     286.15         286.15         286.15   
6             71           1017     286.15         286.15         286.15   
7             71           1016     286.15         286.15         286.15   
8             76           1015     286.15         286.15         286.15   
9             87           1014     287.15         287.15         287.15   

   rain.3h  wind.deg  wind.speed weather.description weather.icon  weather.id  \
0      NaN       250           6       broken clouds          04d         803   
1      NaN       240           7       broken clouds          04d         803   
2      NaN       270           6     overcast clouds          04d         804   
3      NaN       250           4     overcast clouds          04d         804   
4      NaN       310           2     overcast clouds          04n         804   
5      NaN       310           2     overcast clouds          04n         804   
6      NaN       350           1    scattered clouds          03n         802   
7      NaN        10           2    scattered clouds          03n         802   
8      NaN       350           2     overcast clouds          04n         804   
9      NaN       340           3     overcast clouds          04n         804   

  weather.main  
0       Clouds  
1       Clouds  
2       Clouds  
3       Clouds  
4       Clouds  
5       Clouds  
6       Clouds  
7       Clouds  
8       Clouds  
9       Clouds  

Upvotes: 4

Prakhar Verma
Prakhar Verma

Reputation: 467

If I understood your question correctly, I think your are almost there. :)

weatherArray = [ [{u'main': u'Clouds', u'id': 803, u'icon': u'04d', u'description': u'broken clouds'}],
       [{u'main': u'Clouds', u'id': 803, u'icon': u'04d', u'description': u'broken clouds'}],
       [{u'main': u'Clouds', u'id': 804, u'icon': u'04d', u'description': u'overcast clouds'}],
       [{u'main': u'Clouds', u'id': 804, u'icon': u'04d', u'description': u'overcast clouds'}] ]

for weather in weatherArray:
  for i in weather:
   print(i['main'])
   print(i['id'])
   print(i['icon'])
   print(i['description'])

  print('\n')

Use the above code to loop through the contents of weatherArray and add them in different columns.

Upvotes: -1

Related Questions