Reputation: 113
I have the following multi-indexed DataFrame:
m dist
a a 2 5
b 3 8
c 4 12
d 2 3
b a 2 5
b 3 8
c 4 14
d 2 27
I want to calculate a new column s based on an algorithm. For example, for (a,a) the algorithm would be:
select all other rows that have the same level 0 index: --> (a,a), (a,b), (a,c), (a,d)
from this selection: select the rows that have dist <= own dist --> (a,a) and (a,d)
get the sum of column m of this selection --> 2 + 2 = 4
This would results in the following dataframe:
m dist s
a a 2 5 4
b 3 8 7
c 4 12 11
d 2 3 2
b a 2 5 2
b 3 8 5
c 4 14 9
d 2 27 11
This would be easy by looping through the rows, creating a new dataframe for every row, selecting the rows with the right dist and doing a groupby. But with my current dataset this is just too slow. Could this be accomplished with some nice groupby/lambda magic?
Upvotes: 1
Views: 615
Reputation: 214957
You can sort the data frame by dist
and then do a cumsum
on the column m
:
df['s'] = df.sort_values('dist').groupby(level=0).m.cumsum()
This assumes there are no duplicates in the dist
column, if there are, do a further transformation on column s
grouped by dist
and first level of index should satisfy:
df['s'] = df.groupby([df.index.get_level_values(0), 'dist']).s.transform('max')
Upvotes: 3