Roshankumar
Roshankumar

Reputation: 365

incremental counter to a dataframe based on unique values from a column

Existing Dataframe :

Unique_Id          sms_sent_date         
A                    11-01-2022                   
A                    11-01-2022                   
A                    12-01-2022                   
A                    13-01-2022                   

B                    02-02-2022                   
B                    03-02-2022                   
B                    06-02-2022                   
B                    06-02-2022                   

Expected Dataframe :

Unique_Id          sms_sent_date         unique_sms_counter
A                    11-01-2022                   1
A                    11-01-2022                   1
A                    12-01-2022                   2
A                    13-01-2022                   3

B                    02-02-2022                   1
B                    03-02-2022                   2
B                    06-02-2022                   3
B                    06-02-2022                   3

I trying to add the smsq_counter which shall count the unique sms sent to a unique Id. tried this df.groupby(['Unique_Id','sms_sent_date']).cumcount() but stuck with applying it on entire dataframe

Upvotes: 2

Views: 162

Answers (3)

rhug123
rhug123

Reputation: 8778

Another option:

s = df.groupby(df.columns.tolist()).ngroup()

df.assign(unique_sms_counter = (s - s.groupby(df['Unique_Id']).transform('min')).add(1))

Output:

  Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150805

You can use factorize on groupby:

df['unique_sms_counter'] = df.groupby(['Unique_Id'])['sms_sent_date'].transform(lambda x: x.factorize()[0]+1)

Or, if your date is actual datetime type, you can use rank:

df['sms_sent_date'] = pd.to_datetime(df['sms_sent_date'], dayfirst=True)

df.groupby('Unique_Id')['sms_sent_date'].rank('dense').astype(int)

Output:

  Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3

Upvotes: 1

mozway
mozway

Reputation: 262304

You can use:

df['unique_sms_counter'] = (
  # get non duplicates
 (~df.duplicated(['Unique_Id', 'sms_sent_date']))
  # increment per group
 .groupby(df['Unique_Id']).cumsum()
)

output:

  Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3

Upvotes: 1

Related Questions