Compare values in dataframe column by fractions

Question

I have a dataset:

I need to compare index of the last month (within a group of id) with two previous months and check if it's the same.

I'm trying code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'id':[1,1,1,1,2,2,2,2,3,3,3,3],
               'month':[202001,202002,202003,202004,202001,202002,202003,202004,202001,202002,202003,202004,],
               'index':[3,  3,  3,  3,  4,  4,  5,  5,  2,  3,  3,  3]})

df['check']=np.where(df.sort_values(['id', 'month'])
                             .groupby('id')['index']
                             .apply(lambda x: x.shift(3))
                             .transform('nunique')>1,1,0)

It returns error: ValueError: transforms cannot produce aggregated results

Without "apply" the code works. What am I doing wrong?

The desired output is something like this:

Thank you for any suggestions

anky · Accepted Answer

You can try groupby.shift and subtract from index then clip the upper limit to 1:

df['check'] = df['index'].sub(df.groupby("id")['index'].shift(2)).clip(upper=1)

print(df)

    id   month  index  check
0    1  202001      3    NaN
1    1  202002      3    NaN
2    1  202003      3    0.0
3    1  202004      3    0.0
4    2  202001      4    NaN
5    2  202002      4    NaN
6    2  202003      5    1.0
7    2  202004      5    1.0
8    3  202001      2    NaN
9    3  202002      3    NaN
10   3  202003      3    1.0
11   3  202004      3    0.0

Compare values in dataframe column by fractions

Answers (2)

Related Questions