Reputation: 971
I am dealing with the following dataframe:
p q
0 11 2
1 11 2
2 11 2
3 11 3
4 11 3
5 12 2
6 12 2
7 13 2
8 13 2
I want create a new column, say s, which starts with 0 and goes on. This new col is based on "p" column, whenever the p changes, the "s" should change too.
For the first 4 rows, the "p" = 11, so "s" column should have values 0 for this first 4 rows, and so on...
Below is the expectant df:
s p q
0 0 11 2
1 0 11 2
2 0 11 2
3 0 11 2
4 1 11 4
5 1 11 4
6 1 11 4
7 1 11 4
8 2 12 2
9 2 12 2
10 2 12 2
11 3 12 3
12 3 12 3
Upvotes: 1
Views: 104
Reputation: 214927
You need diff
with cumsum
(subtract one if you want the id to start from 0):
df["finalID"] = (df.ProjID.diff() != 0).cumsum()
df
Update, if you want to take both voyg_id and ProjID into consideration, you can use a OR condition on the two columns difference, so that whichever column changes, you get an increase in the final id.
df['final_id'] = ((df.voyg_id.diff() != 0) | (df.proj_id.diff() != 0)).cumsum()
df
Upvotes: 3