Pandas get numbers in column to increase based on previous rows

Question

So I have some data that groups rows based on a number in the column 'Code'. This data is somewhat corrupted however and sometimes does not take the previous row's max as new number but resets it to 1.

Basically the data looks like this:

   Code      Date
0      1  20200501
1      1  20200502
2      2  20200502
3      2  20200502
4      3  20200502
5      3  20200505
6      1  20200505
7      1  20200505
8      2  20200505
9      2  20200505
10     1  20200505
11     1  20200505
12     6  20200505
13     6  20200505

the numbering in the code column is corrupted here and should look like this

   Code      Date
0      1  20200501
1      1  20200502
2      2  20200502
3      2  20200502
4      3  20200502
5      3  20200505
6      4  20200505
7      4  20200505
8      5  20200505
9      5  20200505
10     4  20200505
11     4  20200505
12     6  20200505
13     6  20200505

Can someone help me find a way to fix this data numbering?

EDIT: edited this to provide some more info

So the data groups rows based on numbers. so everything with the same number is part of a group. The numbering in the code column should go up, starting at 1. sometimes however the numbering resets back to 1. so i might have 1-2-3-1, which should be 1-2-3-4, if that makes sense.

anky · Accepted Answer

Based on what I understand, you can try this which uses cummax to determine if the existing value is lesser than the previous max and based on it adds the existing value to it:

a = df['Code'].cummax()
s = a>df['Code']
df['New_Code'] = np.where(s,a+df['Code'],df['Code']

print(df)

    Code      Date  New_Code
0      1  20200501         1
1      1  20200502         1
2      2  20200502         2
3      2  20200502         2
4      3  20200502         3
5      3  20200505         3
6      1  20200505         4
7      1  20200505         4
8      2  20200505         5
9      2  20200505         5
10     1  20200505         4
11     1  20200505         4
12     6  20200505         6
13     6  20200505         6

Pandas get numbers in column to increase based on previous rows

Answers (1)

Related Questions