Reputation: 123
So I have some data that groups rows based on a number in the column 'Code'. This data is somewhat corrupted however and sometimes does not take the previous row's max as new number but resets it to 1.
Basically the data looks like this:
Code Date
0 1 20200501
1 1 20200502
2 2 20200502
3 2 20200502
4 3 20200502
5 3 20200505
6 1 20200505
7 1 20200505
8 2 20200505
9 2 20200505
10 1 20200505
11 1 20200505
12 6 20200505
13 6 20200505
the numbering in the code column is corrupted here and should look like this
Code Date
0 1 20200501
1 1 20200502
2 2 20200502
3 2 20200502
4 3 20200502
5 3 20200505
6 4 20200505
7 4 20200505
8 5 20200505
9 5 20200505
10 4 20200505
11 4 20200505
12 6 20200505
13 6 20200505
Can someone help me find a way to fix this data numbering?
EDIT: edited this to provide some more info
So the data groups rows based on numbers. so everything with the same number is part of a group. The numbering in the code column should go up, starting at 1. sometimes however the numbering resets back to 1. so i might have 1-2-3-1, which should be 1-2-3-4, if that makes sense.
Upvotes: 2
Views: 145
Reputation: 75100
Based on what I understand, you can try this which uses cummax to determine if the existing value is lesser than the previous max and based on it adds the existing value to it:
a = df['Code'].cummax()
s = a>df['Code']
df['New_Code'] = np.where(s,a+df['Code'],df['Code']
print(df)
Code Date New_Code
0 1 20200501 1
1 1 20200502 1
2 2 20200502 2
3 2 20200502 2
4 3 20200502 3
5 3 20200505 3
6 1 20200505 4
7 1 20200505 4
8 2 20200505 5
9 2 20200505 5
10 1 20200505 4
11 1 20200505 4
12 6 20200505 6
13 6 20200505 6
Upvotes: 3