Reputation: 2189
I have a data frame like this,
col1 col2 col3
1 2 3
2 3 4
4 2 3
7 2 8
8 3 4
9 3 3
15 1 12
Now I want to group those rows where there difference between two consecutive col1 rows is less than 3. and sum other column values, create another column(col4) with the last value of the group, So the final data frame will look like,
col1 col2 col3 col4
1 7 10 4
7 8 15 9
using for loop to do this is tedious, looking for some pandas shortcuts to do it most efficiently.
Upvotes: 2
Views: 415
Reputation: 150735
You can do a named aggregation on groupby:
(df.groupby(df.col1.diff().ge(3).cumsum(), as_index=False)
.agg(col1=('col1','first'),
col2=('col2','sum'),
col3=('col3','sum'),
col4=('col1','last'))
)
Output:
col1 col2 col3 col4
0 1 7 10 4
1 7 8 15 9
2 15 1 12 15
update without named aggregation you can do some thing like this:
groups = df.groupby(df.col1.diff().ge(3).cumsum())
new_df = groups.agg({'col1':'first', 'col2':'sum','col3':'sum'})
new_df['col4'] = groups['col1'].last()
Upvotes: 2