matthme
matthme

Reputation: 263

Change repeating groups in a column to incremental groups

I have the following dataframe:

df = pd.DataFrame({'group_nr':[0,0,1,1,1,2,2,3,3,0,0,1,1,2,2,2,3,3]})

print(df)

    group_nr
0          0
1          0
2          1
3          1
4          1
5          2
6          2
7          3
8          3
9          0
10         0
11         1
12         1
13         2
14         2
15         2
16         3
17         3

and would like to change from repeating group numbers to incremental group numbers:

    group_nr  incremental_group_nr
0          0                     0
1          0                     0
2          1                     1
3          1                     1
4          1                     1
5          2                     2
6          2                     2
7          3                     3
8          3                     3
9          0                     4
10         0                     4
11         1                     5
12         1                     5
13         2                     6
14         2                     6
15         2                     6
16         3                     7
17         3                     7

I can't find a way of doing this without looping through the rows. Does someone have an idea how to implement this nicely?

Upvotes: 1

Views: 55

Answers (2)

yatu
yatu

Reputation: 88226

You can check if the values are equal to the following, and take a cumsum of the boolean series to generate the groups:

df['incremental_group_nr'] = df.group_nr.ne(df.group_nr.shift()).cumsum().sub(1)

print(df)

    group_nr  incremental_group_nr
0          0                     0
1          0                     0
2          1                     1
3          1                     1
4          1                     1
5          2                     2
6          2                     2
7          3                     3
8          3                     3
9          0                     4
10         0                     4
11         1                     5
12         1                     5
13         2                     6
14         2                     6
15         2                     6
16         3                     7
17         3                     7

Upvotes: 5

jezrael
jezrael

Reputation: 862511

Compare by shifted values by Series.shift with not equal by Series.ne and then add cumulative sum with subract 1:

df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift()).cumsum() - 1
print(df)
    group_nr  incremental_group_nr
0          0                     0
1          0                     0
2          1                     1
3          1                     1
4          1                     1
5          2                     2
6          2                     2
7          3                     3
8          3                     3
9          0                     4
10         0                     4
11         1                     5
12         1                     5
13         2                     6
14         2                     6
15         2                     6
16         3                     7
17         3                     7

Another idea is use backfilling first missing value after shift by bfill:

df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift().bfill()).cumsum() 
print(df)
    group_nr  incremental_group_nr
0          0                     0
1          0                     0
2          1                     1
3          1                     1
4          1                     1
5          2                     2
6          2                     2
7          3                     3
8          3                     3
9          0                     4
10         0                     4
11         1                     5
12         1                     5
13         2                     6
14         2                     6
15         2                     6
16         3                     7
17         3                     7

Upvotes: 1

Related Questions