Reputation: 263
I have the following dataframe:
df = pd.DataFrame({'group_nr':[0,0,1,1,1,2,2,3,3,0,0,1,1,2,2,2,3,3]})
print(df)
group_nr
0 0
1 0
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 0
10 0
11 1
12 1
13 2
14 2
15 2
16 3
17 3
and would like to change from repeating group numbers to incremental group numbers:
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
I can't find a way of doing this without looping through the rows. Does someone have an idea how to implement this nicely?
Upvotes: 1
Views: 55
Reputation: 88226
You can check if the values are equal to the following, and take a cumsum
of the boolean series to generate the groups:
df['incremental_group_nr'] = df.group_nr.ne(df.group_nr.shift()).cumsum().sub(1)
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Upvotes: 5
Reputation: 862511
Compare by shifted values by Series.shift
with not equal by Series.ne
and then add cumulative sum with subract 1
:
df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift()).cumsum() - 1
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Another idea is use backfilling first missing value after shift
by bfill
:
df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift().bfill()).cumsum()
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Upvotes: 1