Group-by then row shift in each group

Question

I have dataset that can be grouped with following:

  df.groupby(df.batch.str[:7])

different group sizes are

df.groupby(df.batch.str[:7]).size().unique()

 array([1, 2, 3, 4, 5, 6, 7])

lets say I will take any group with size 4 arranged with time column ,

    col1     col2  time 
0   rt_2345  NaN   1:00
1   rt_2345  3.0   2:00
2   rt_2345  4.0   3:00
3   rt_2345  46.0  4:00

it will have 4 rows, the col2 of each group is NaN and I want to shift 1 row up for every group.

Goal

   col1      col2  time
0   rt_2345  3.0   1:00
1   rt_2345  4.0   2:00
2   rt_2345  46.0  3:00
3   rt_2345  NaN   4:00

This can be down with following code with gruop sizes of 2 but not valid for 3,4,5,6,7

.fillna(df.groupby(df.batch.str[:7])['col1'].bfill()))

jezrael · Accepted Answer

Use DataFrameGroupBy.shift:

df['col2'] = df.groupby(df.batch.str[:7])['col2'].shift(-1)

EDIT: If need sorting per multiple columns is possible create column a by first 7 values of batch and pass to DataFrame.sort_values:

df = df.assign(a = df.batch.str[:7]).sort_values(['a', 'time'], 
                                                 ascending=[True, False])

df['col2'] = df.groupby('a')['col2'].shift(-1)

Group-by then row shift in each group

Answers (2)

Related Questions