Kallol
Kallol

Reputation: 2189

group rows in a pandas data frame when the difference of consecutive rows are less than a value

I have a data frame like this,

col1    col2    col3
 1        2       3
 2        3       4
 4        2       3
 7        2       8
 8        3       4
 9        3       3
 15       1       12

Now I want to group those rows where there difference between two consecutive col1 rows is less than 3. and sum other column values, create another column(col4) with the last value of the group, So the final data frame will look like,

col1    col2    col3    col4
  1       7       10     4
  7       8       15     9

using for loop to do this is tedious, looking for some pandas shortcuts to do it most efficiently.

Upvotes: 2

Views: 415

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You can do a named aggregation on groupby:

(df.groupby(df.col1.diff().ge(3).cumsum(), as_index=False)
   .agg(col1=('col1','first'),
        col2=('col2','sum'),
        col3=('col3','sum'),
        col4=('col1','last'))
)

Output:

   col1  col2  col3  col4
0     1     7    10     4
1     7     8    15     9
2    15     1    12    15

update without named aggregation you can do some thing like this:

groups = df.groupby(df.col1.diff().ge(3).cumsum())
new_df = groups.agg({'col1':'first', 'col2':'sum','col3':'sum'})
new_df['col4'] = groups['col1'].last()

Upvotes: 2

Related Questions