python pandas Problem repeating the previous value

is my code

import pandas as pd

columns1 = ['Student ID', 'Course ID', 'Marks']
data1 = [(1, 10, 100), (2, 400, 200), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300)]
df1 = pd.DataFrame(data1, columns=columns1)
Student ID Course ID Marks
1 10 100
2 400 200
3 30 300
3 30 300
3 30 300
3 30 300
3 30 300
3 30 300
df1['s']  = np.where((df1['Course ID']  > df1['Marks'])  == True, df1['Student ID'],  df1['s'].shift(1)) 
df1
Student ID Course ID Marks s
1 10 100 NaN
2 400 200 2
3 30 300 2
3 30 300 NaN
3 30 300 NaN
3 30 300 NaN
3 30 300 NaN
3 30 300 NaN

As you can see, only the information of two rows has changed and the rest are null. This is the result I expect because after column 2 condition "df1['Course ID'] > df1['Marks']" is true

Student ID Course ID Marks s
1 10 100 NaN
2 400 200 2
3 30 300 2
3 30 300 2
3 30 300 2
3 30 300 2
3 30 300 2
3 30 300 2

Thank you for your help

Upvotes: 1

Views: 77

Answers (1)

mozway
mozway

Reputation: 262284

If you want to assign the Student ID for rows matching the df1['Course ID'] > df1['Marks'] condition, and for other rows take the previous value, use ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .ffill()
            .convert_dtypes() # optional
           )

Output (with a slightly different input):

   Student ID  Course ID  Marks     s
0           1         10    100  <NA>
1           2        400    200     2
2           3         30    300     2
3           3         30    300     2
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

If you only want to apply this logic per Student ID, which might make more sense to avoid "leaking" values from one student to another rather use groupby.ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .groupby(df1['Student ID']).ffill()
            .convert_dtypes() # optional
           )

Or:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'].gt(df1['Marks'])
                   .groupby(df1['Student ID']).cummax())
            .convert_dtypes() # optional
           )

Output:

   Student ID  Course ID  Marks     s
0           1         10    100  <NA>
1           2        400    200     2
2           3         30    300  <NA>
3           3         30    300  <NA>
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

Upvotes: 0

Related Questions