Pi-R
Pi-R

Reputation: 654

use groupby and custom agg in a dataframe pandas

I have this dataframe :

id start end
1  1     2
1  13    27
1  30    35
1  36    40
2  2     5
2  8     10
2  25    30

I want to groupby over id and aggregate rows where difference of end of n-1 row and start of n row is less than 10 for example. I already find a way using a loop but it's far too long with over a million rows.

So the expected outcome would be :

id start end
1  1     2
1  13    40
2  2     10
2  25    30

First I can get the required difference by using df['diff']=df['start'].shift(-1)-df['end']. How can I gather ids based on the condition for each different id ?

Thanks !

Upvotes: 0

Views: 68

Answers (1)

jezrael
jezrael

Reputation: 862901

I believe you can create groups by suntract shifted end by DataFrameGroupBy.shift with greater like 10 and cumulative sum and pass to GroupBy.agg:

g = df['start'].sub(df.groupby('id')['end'].shift()).gt(10).cumsum()
   
df = (df.groupby(['id',g])
        .agg({'start':'first', 'end': 'last'})
        .reset_index(level=1, drop=True)
        .reset_index())
print (df)
   id  start  end
0   1      1    2
1   1     13   40
2   2      2   10
3   2     25   30

Upvotes: 1

Related Questions