Reputation: 359
I have a dataframe that looks like -
block lat lon
0 0 112 50
1 0 112 50
2 0 112 50
3 1 105 20
4 1 105 20
5 2 130 30
and I want to first groupby block and then apply a function to the lat lon columns. eg
df['location_id'] = df.groupby('block').apply(lambda x: get_location_id(x['lat'], x['lon'])
For each lat/lon my function returns an ID. I want a new column that has that ID. When I try above it doesn't work, and the lambda function seems to dislike me using axis=1 or similar. ie -
block lat lon location_id
0 0 112 50 1
1 0 112 50 1
2 0 112 50 1
3 1 105 20 23
4 1 105 20 23
5 2 130 30 15
I'd like to avoid just applying the function to the ungrouped dataframe because my dataset is quite large and that will be slow.
edit: the function returns a string id. Takes in lat lon and returns a single string value.
Upvotes: 0
Views: 54
Reputation: 174
Depending on how "large" your "large" Dataset is, there might be different solutions... And I'm not 100% certain you can (or should) do what you want with groupby. I'd suggest (and I'm relatively sure that it will work even in distributed environments) to do the following:
block,lat,lon
combinations.Edit: Pandas is quite fine with handling datasets that take gigabytes in RAM as long as you have the ram, so just applying the functio might be a lot more viable than you think.
Upvotes: 1