pandas: groupby only if condition is satisfied

Question

I have the following multi-indexed DataFrame:

        m     dist

a  a    2     5
   b    3     8
   c    4     12
   d    2     3
b  a    2     5
   b    3     8
   c    4     14
   d    2     27

I want to calculate a new column s based on an algorithm. For example, for (a,a) the algorithm would be:

select all other rows that have the same level 0 index: --> (a,a), (a,b), (a,c), (a,d)

from this selection: select the rows that have dist <= own dist --> (a,a) and (a,d)

get the sum of column m of this selection --> 2 + 2 = 4

This would results in the following dataframe:

        m     dist  s

a  a    2     5     4
   b    3     8     7
   c    4     12    11
   d    2     3     2
b  a    2     5     2
   b    3     8     5
   c    4     14    9
   d    2     27    11

This would be easy by looping through the rows, creating a new dataframe for every row, selecting the rows with the right dist and doing a groupby. But with my current dataset this is just too slow. Could this be accomplished with some nice groupby/lambda magic?

akuiper · Accepted Answer

You can sort the data frame by dist and then do a cumsum on the column m:

df['s'] = df.sort_values('dist').groupby(level=0).m.cumsum()

This assumes there are no duplicates in the dist column, if there are, do a further transformation on column s grouped by dist and first level of index should satisfy:

df['s'] = df.groupby([df.index.get_level_values(0), 'dist']).s.transform('max')

pandas: groupby only if condition is satisfied

Answers (1)

Related Questions