Reputation: 11
I am currently working on extracting latitude and longitude values of "point" field from json . I want to store each field value in two new columns. The JSON in pandas dataframe is as follows:
user_id timestamp point
0 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486}
1 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486}
2 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486}
3 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486}
4 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486}
I tried in the following function and works:
row = test["point"][0]
type(row)
row["lat"]
But now I want to extract with the another function and I get the new columns with:
<function getLat at 0x0000000006AE2950>
<function getLon at 0x0000000006A58BF8>
The function is:
def getLat(point):
return point["lat"]
def getLon(point):
return point["lon"]
test["lat"] = test["point"].apply(lambda row: getLat)
test["lon"] = test["point"].apply(lambda row: getLon)
And I get the following table:
user_id timestamp point lat lon
0 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486} <function getLat at 0x0000000006AE2950> <function getLon at 0x0000000006A58BF8>
1 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486} <function getLat at 0x0000000006AE2950> <function getLon at 0x0000000006A58BF8>
2 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486} <function getLat at 0x0000000006AE2950> <function getLon at 0x0000000006A58BF8>
3 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486} <function getLat at 0x0000000006AE2950> <function getLon at 0x0000000006A58BF8>
4 58d3ac2de04bd5249b4f6600 14567410 {'lon': 2.9546, 'lat': 41.5486} <function getLat at 0x0000000006AE2950> <function getLon at 0x0000000006A58BF8>
Where is my code incorrect?
Thanks
Upvotes: 1
Views: 613
Reputation: 862601
Use only function name:
test["lat"] = test["point"].apply(getLat)
It is same like:
test["lat"] = test["point"].apply(lambda x: getLat(x))
Or use only lambda function:
test["lat"] = test["point"].apply(lambda x: x["lat"])
print (test)
user_id timestamp point \
0 58d3ac2de04bd5249b4f6600 14567410 {'lat': 41.5486, 'lon': 2.9546}
1 58d3ac2de04bd5249b4f6600 14567410 {'lat': 41.5486, 'lon': 2.9546}
2 58d3ac2de04bd5249b4f6600 14567410 {'lat': 41.5486, 'lon': 2.9546}
3 58d3ac2de04bd5249b4f6600 14567410 {'lat': 41.5486, 'lon': 2.9546}
4 58d3ac2de04bd5249b4f6600 14567410 {'lat': 41.5486, 'lon': 2.9546}
lat
0 41.5486
1 41.5486
2 41.5486
3 41.5486
4 41.5486
Solution with DataFrame
constructor, pop
delete column in original DataFrame and join
is for append to original:
test = test.join(pd.DataFrame(test.pop('point').values.tolist(), index=test.index))
print (test)
user_id timestamp lat lon
0 58d3ac2de04bd5249b4f6600 14567410 41.5486 2.9546
1 58d3ac2de04bd5249b4f6600 14567410 41.5486 2.9546
2 58d3ac2de04bd5249b4f6600 14567410 41.5486 2.9546
3 58d3ac2de04bd5249b4f6600 14567410 41.5486 2.9546
4 58d3ac2de04bd5249b4f6600 14567410 41.5486 2.9546
Another idea is use json_normalize
:
from pandas.io.json import json_normalize
import json
with open('file.json') as data_file:
d = json.load(data_file)
df = json_normalize(d)
print (df)
Upvotes: 1