Reputation: 13
is my code
import pandas as pd
columns1 = ['Student ID', 'Course ID', 'Marks']
data1 = [(1, 10, 100), (2, 400, 200), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300)]
df1 = pd.DataFrame(data1, columns=columns1)
Student ID | Course ID | Marks |
---|---|---|
1 | 10 | 100 |
2 | 400 | 200 |
3 | 30 | 300 |
3 | 30 | 300 |
3 | 30 | 300 |
3 | 30 | 300 |
3 | 30 | 300 |
3 | 30 | 300 |
df1['s'] = np.where((df1['Course ID'] > df1['Marks']) == True, df1['Student ID'], df1['s'].shift(1))
df1
Student ID | Course ID | Marks | s |
---|---|---|---|
1 | 10 | 100 | NaN |
2 | 400 | 200 | 2 |
3 | 30 | 300 | 2 |
3 | 30 | 300 | NaN |
3 | 30 | 300 | NaN |
3 | 30 | 300 | NaN |
3 | 30 | 300 | NaN |
3 | 30 | 300 | NaN |
As you can see, only the information of two rows has changed and the rest are null. This is the result I expect because after column 2 condition "df1['Course ID'] > df1['Marks']" is true
Student ID | Course ID | Marks | s |
---|---|---|---|
1 | 10 | 100 | NaN |
2 | 400 | 200 | 2 |
3 | 30 | 300 | 2 |
3 | 30 | 300 | 2 |
3 | 30 | 300 | 2 |
3 | 30 | 300 | 2 |
3 | 30 | 300 | 2 |
3 | 30 | 300 | 2 |
Thank you for your help
Upvotes: 1
Views: 77
Reputation: 262284
If you want to assign the Student ID for rows matching the df1['Course ID'] > df1['Marks']
condition, and for other rows take the previous value, use ffill
:
df1['s'] = (df1['Student ID']
.where(df1['Course ID'] > df1['Marks'])
.ffill()
.convert_dtypes() # optional
)
Output (with a slightly different input):
Student ID Course ID Marks s
0 1 10 100 <NA>
1 2 400 200 2
2 3 30 300 2
3 3 30 300 2
4 3 400 300 3
5 3 30 300 3
6 3 30 300 3
7 3 30 300 3
If you only want to apply this logic per Student ID, which might make more sense to avoid "leaking" values from one student to another rather use groupby.ffill
:
df1['s'] = (df1['Student ID']
.where(df1['Course ID'] > df1['Marks'])
.groupby(df1['Student ID']).ffill()
.convert_dtypes() # optional
)
Or:
df1['s'] = (df1['Student ID']
.where(df1['Course ID'].gt(df1['Marks'])
.groupby(df1['Student ID']).cummax())
.convert_dtypes() # optional
)
Output:
Student ID Course ID Marks s
0 1 10 100 <NA>
1 2 400 200 2
2 3 30 300 <NA>
3 3 30 300 <NA>
4 3 400 300 3
5 3 30 300 3
6 3 30 300 3
7 3 30 300 3
Upvotes: 0