Reputation: 179
I am making a counter column that increases only when the values at (i-1) are different than that of (i), but it resets when user_id changes.
Input:
user_id sc_id
1 100
1 100
1 101
1 102
2 100
2 101
3 101
3 103
Below is my code:
df['subcat_counter'] = 1
for i in range(1,len(df)):
if df.sc_id[i-1]==df.sc_id[i] and df['user_id'][i-1]==df['user_id'][i]:
df.at[i,'subcat_counter']=df.subcat_counter[i-1]
else:
df.at[i,'subcat_counter']=df.subcat_counter[i-1]+1
Below is the desired output:
user_id sc_id subcat_counter
1 100 1
1 100 1
1 101 2
1 102 3
2 100 1
2 101 2
3 101 1
3 103 2
Upvotes: 1
Views: 288
Reputation: 153460
You can do it this way using groupby
df['subcat_counter'] = (df.groupby('user_id')['sc_id']
.transform(lambda x: x.diff().gt(0).cumsum() + 1))
Output:
user_id sc_id subcat_counter
0 1 100 1
1 1 100 1
2 1 101 2
3 1 102 3
4 2 100 1
5 2 101 2
6 3 101 1
7 3 103 2
Upvotes: 0
Reputation: 42886
We can groupby
with cumcount
and before that drop duplicates so each same, user_id
and sc_id
get the same subcat_counter
. After that we fillna
with forwardfill (ffill)
:
df['subcat_counter'] = df.drop_duplicates(['user_id', 'sc_id'])\
.groupby(['user_id'])['sc_id']\
.cumcount()+1
df.fillna(method='ffill', inplace=True)
print(df)
user_id sc_id subcat_counter
0 1 100 1.0
1 1 100 1.0
2 1 101 2.0
3 1 102 3.0
4 2 100 1.0
5 2 101 2.0
6 3 101 1.0
7 3 103 2.0
Upvotes: 2