ursteiner
ursteiner

Reputation: 149

Python add new column with repeating value based on two other columns

I have the following df:

Doc Item
1    1
1    1
1    2
1    3
2    1
2    2

I want to add third column with repeating values that (1) increment by one if there is a change in column "Item" and that also (2) restarts if there is a change in column "Doc"

Doc Item  NewCol
 1    1     1
 1    1     1
 1    2     2
 1    3     3
 2    1     1 
 2    2     2

What is the best way to achieve this? Thanks a lot.

Upvotes: 1

Views: 65

Answers (1)

jezrael
jezrael

Reputation: 862651

Use GroupBy.transform wth custom lambda function with factorize:

df['NewCol'] = df.groupby('Doc')['Item'].transform(lambda x: pd.factorize(x)[0]) + 1
print (df)
   Doc  Item  NewCol
0    1     1       1
1    1     1       1
2    1     2       2
3    1     3       3
4    2     1       1
5    2     2       2

If values in Item are integers is possible use GroupBy.rank:

df['NewCol'] = df.groupby('Doc')['Item'].rank(method='dense').astype(int)

Upvotes: 2

Related Questions