Reputation: 16987
What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:
from urllib2 import Request, urlopen
import json
path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
This gives me a data that looks like this:
elevations.splitlines()
['{',
' "results" : [',
' {',
' "elevation" : 243.3462677001953,',
' "location" : {',
' "lat" : 42.974049,',
' "lng" : -81.205203',
' },',
' "resolution" : 19.08790397644043',
' },',
' {',
' "elevation" : 244.1318664550781,',
' "location" : {',
' "lat" : 42.974298,',
' "lng" : -81.19575500000001',
' },',
' "resolution" : 19.08790397644043',
' }',
' ],',
' "status" : "OK"',
'}']
when putting into as DataFrame here is what I get:
pd.read_json(elevations)
and here is what I want:
I'm not sure if this is possible, but mainly what I am looking for is a way to be able to put the elevation, latitude and longitude data together in a pandas dataframe (doesn't have to have fancy mutiline headers).
If any one can help or give some advice on working with this data that would be great! If you can't tell I haven't worked much with json data before...
EDIT:
This method isn't all that attractive but seems to work:
data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
lat.append(result[u'location'][u'lat'])
lng.append(result[u'location'][u'lng'])
el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T
ends up dataframe having columns latitude, longitude, elevation
Upvotes: 258
Views: 880552
Reputation: 765
Refer MongoDB documentation, I got the following code:
from pandas import DataFrame
df = DataFrame('Your json string')
Upvotes: 0
Reputation: 467
use Json to load the file and convert it to a pandas dataframe using DataFrame.from_dict function
import json
import pandas as pd
json_string = '{ "name":"John", "age":30, "car":"None" }'
a_json = json.loads(json_string)
print(a_json)
dataframe = pd.DataFrame.from_dict(a_json)
Upvotes: 14
Reputation: 7279
Rumble supports JSON natively with JSONiq and runs on Spark, managing DataFrames internally so you don't need to -- even if the data isn't fully structured:
let $coords := "42.974049,-81.205203%7C42.974298,-81.195755"
let $request := json-doc("http://maps.googleapis.com/maps/api/elevation/json?locations="||$coords||"&sensor=false")
for $obj in $request.results[]
return {
"latitude" : $obj.location.lat,
"longitude" : $obj.location.lng,
"elevation" : $obj.elevation
}
The results can be exported to CSV and then reopened in any other host language as a DataFrame.
Upvotes: 0
Reputation: 861
Optimization of the accepted answer:
The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:
import requests
from pandas import json_normalize
url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'
response = requests.get(url)
dictr = response.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)
Output:
_id HourUTC HourDK ... ElbasAveragePriceEUR ElbasMaxPriceEUR ElbasMinPriceEUR
0 264028 2019-01-01T00:00:00+00:00 2019-01-01T01:00:00 ... NaN NaN NaN
1 138428 2017-09-03T15:00:00+00:00 2017-09-03T17:00:00 ... 33.28 33.4 32.0
2 138429 2017-09-03T16:00:00+00:00 2017-09-03T18:00:00 ... 35.20 35.7 34.9
3 138430 2017-09-03T17:00:00+00:00 2017-09-03T19:00:00 ... 37.50 37.8 37.3
4 138431 2017-09-03T18:00:00+00:00 2017-09-03T20:00:00 ... 39.65 42.9 35.3
.. ... ... ... ... ... ... ...
995 139290 2017-10-09T13:00:00+00:00 2017-10-09T15:00:00 ... 38.40 38.4 38.4
996 139291 2017-10-09T14:00:00+00:00 2017-10-09T16:00:00 ... 41.90 44.3 33.9
997 139292 2017-10-09T15:00:00+00:00 2017-10-09T17:00:00 ... 46.26 49.5 41.4
998 139293 2017-10-09T16:00:00+00:00 2017-10-09T18:00:00 ... 56.22 58.5 49.1
999 139294 2017-10-09T17:00:00+00:00 2017-10-09T19:00:00 ... 56.71 65.4 42.2
PS: API is for Danish electricity prices
Upvotes: 36
Reputation: 481
I prefer a more generic method in which may be user doesn't prefer to give key 'results'. You can still flatten it by using a recursive approach of finding key having nested data or if you have key but your JSON is very nested. It is something like:
from pandas import json_normalize
def findnestedlist(js):
for i in js.keys():
if isinstance(js[i],list):
return js[i]
for v in js.values():
if isinstance(v,dict):
return check_list(v)
def recursive_lookup(k, d):
if k in d:
return d[k]
for v in d.values():
if isinstance(v, dict):
return recursive_lookup(k, v)
return None
def flat_json(content,key):
nested_list = []
js = json.loads(content)
if key is None or key == '':
nested_list = findnestedlist(js)
else:
nested_list = recursive_lookup(key, js)
return json_normalize(nested_list,sep="_")
key = "results" # If you don't have it, give it None
csv_data = flat_json(your_json_string,root_key)
print(csv_data)
Upvotes: 1
Reputation: 16987
I found a quick and easy solution to what I wanted using json_normalize()
included in pandas 1.01
.
from urllib2 import Request, urlopen
import json
import pandas as pd
path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])
This gives a nice flattened dataframe with the json data that I got from the Google Maps API.
Upvotes: 319
Reputation: 11
Once you have the flattened DataFrame
obtained by the accepted answer, you can make the columns a MultiIndex
("fancy multiline header") like this:
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])
Upvotes: 1
Reputation: 188
Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.
# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize
class DFConverter:
#Converts the input JSON to a DataFrame
def convertToDF(self,dfJSON):
return(json_normalize(dfJSON))
#Converts the input DataFrame to JSON
def convertToJSON(self, df):
resultJSON = df.to_json(orient='records')
return(resultJSON)
Upvotes: 8
Reputation: 193
#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()
>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
... strlist = "["+x+"]"
... datalist=json.loads(strlist)
... for y in datalist:
... print(type(y))
... print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
Upvotes: 1
Reputation: 3170
Just a new version of the accepted answer, as python3.x
does not support urllib2
from requests import request
import json
from pandas.io.json import json_normalize
path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])
Upvotes: 19
Reputation: 839
billmanH's solution helped me but didn't work until i switched from:
n = data.loc[row,'json_column']
to:
n = data.iloc[[row]]['json_column']
here's the rest of it, converting to a dictionary is helpful for working with json data.
import json
for row in range(len(data)):
n = data.iloc[[row]]['json_column'].item()
jsonDict = json.loads(n)
if ('mykey' in jsonDict):
display(jsonDict['mykey'])
Upvotes: 1
Reputation: 3968
Check this snip out.
# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
dict_train = json.load(train_file)
# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
Hope it helps :)
Upvotes: 50
Reputation: 1426
The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.
for row in range(len(data)):
#First I load the dict (one at a time)
n = data.loc[row,'dict_column']
#Now I make a new column that pulls out the data that I want.
data.loc[row,'new_column'] = n.get('key')
Upvotes: 5
Reputation: 1519
You could first import your json data in a Python dictionnary :
data = json.loads(elevations)
Then modify data on the fly :
for result in data['results']:
result[u'lat']=result[u'location'][u'lat']
result[u'lng']=result[u'location'][u'lng']
del result[u'location']
Rebuild json string :
elevations = json.dumps(data)
Finally :
pd.read_json(elevations)
You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)
Upvotes: 18