excelater
excelater

Reputation: 117

Column that counts based on another one

I have a data frame like this.

 df = pd.DataFrame({'transaction_id':[12565,12565,12743,12743,13456,13456,13856],
                'rep_id':[560,560,560,560,287,287,287]})

I want to create a new column with a counter like below.

       transaction_id   rep_id  trans_num
    0           12565      560          1
    1           12565      560          1
    2           12743      560          2
    3           12743      560          2
    4           13456      287          1
    5           13456      287          1
    6           13856      287          2

Upvotes: 2

Views: 48

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150805

Based on your data (transaction_id are different if rep_id are different) we can also do:

df['new'] = (df['transaction_id'].ne(df['transaction_id'].shift())
    .groupby(df['rep_id']).cumsum()
)

Update: You can also use rank, although it behaves a bit differently:

df.groupby('rep_id')['transaction_id'].rank('dense').astype(int)

Output:

   transaction_id  rep_id  new
0           12565     560    1
1           12565     560    1
2           12743     560    2
3           12743     560    2
4           13456     287    1
5           13456     287    1
6           13856     287    2

Upvotes: 2

BENY
BENY

Reputation: 323376

Try with transform + factorize

df['new']=df.groupby('rep_id').transaction_id.transform(lambda x : pd.factorize(x)[0]+1)
df
Out[389]: 
   transaction_id  rep_id  new
0           12565     560    1
1           12565     560    1
2           12743     560    2
3           12743     560    2
4           13456     287    1
5           13456     287    1
6           13856     287    2

Upvotes: 8

Related Questions