alex3465
alex3465

Reputation: 419

Group-by then row shift in each group

I have dataset that can be grouped with following:

  df.groupby(df.batch.str[:7])

different group sizes are

df.groupby(df.batch.str[:7]).size().unique()

 array([1, 2, 3, 4, 5, 6, 7])

lets say I will take any group with size 4 arranged with time column ,

    col1     col2  time 
0   rt_2345  NaN   1:00
1   rt_2345  3.0   2:00
2   rt_2345  4.0   3:00
3   rt_2345  46.0  4:00

it will have 4 rows, the col2 of each group is NaN and I want to shift 1 row up for every group.

Goal

   col1      col2  time
0   rt_2345  3.0   1:00
1   rt_2345  4.0   2:00
2   rt_2345  46.0  3:00
3   rt_2345  NaN   4:00

This can be down with following code with gruop sizes of 2 but not valid for 3,4,5,6,7

.fillna(df.groupby(df.batch.str[:7])['col1'].bfill()))

Upvotes: 1

Views: 813

Answers (2)

jezrael
jezrael

Reputation: 862441

Use DataFrameGroupBy.shift:

df['col2'] = df.groupby(df.batch.str[:7])['col2'].shift(-1)

EDIT: If need sorting per multiple columns is possible create column a by first 7 values of batch and pass to DataFrame.sort_values:

df = df.assign(a = df.batch.str[:7]).sort_values(['a', 'time'], 
                                                 ascending=[True, False])

df['col2'] = df.groupby('a')['col2'].shift(-1)

Upvotes: 1

BENY
BENY

Reputation: 323226

We can try use sorted with key in this situation you do not need worry about how many NaN in the top

df['new'] = df.groupby(df.col1.str[:7])['col2'].apply(lambda x : sorted(x, key=pd.isnull) ).explode().values
df
Out[145]: 
      col1  col2  time  new
0  rt_2345   NaN  1:00    3
1  rt_2345   3.0  2:00    4
2  rt_2345   4.0  3:00   46
3  rt_2345  46.0  4:00  NaN

Upvotes: 1

Related Questions