python pandas Problem repeating the previous value

Question

is my code

import pandas as pd

columns1 = ['Student ID', 'Course ID', 'Marks']
data1 = [(1, 10, 100), (2, 400, 200), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300)]
df1 = pd.DataFrame(data1, columns=columns1)

Student ID	Course ID	Marks
1	10	100
2	400	200
3	30	300
3	30	300
3	30	300
3	30	300
3	30	300
3	30	300

df1['s']  = np.where((df1['Course ID']  > df1['Marks'])  == True, df1['Student ID'],  df1['s'].shift(1)) 
df1

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN

As you can see, only the information of two rows has changed and the rest are null. This is the result I expect because after column 2 condition "df1['Course ID'] > df1['Marks']" is true

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	2
3	30	300	2
3	30	300	2
3	30	300	2
3	30	300	2

Thank you for your help

mozway · Accepted Answer

If you want to assign the Student ID for rows matching the df1['Course ID'] > df1['Marks'] condition, and for other rows take the previous value, use ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .ffill()
            .convert_dtypes() # optional
           )

Output (with a slightly different input):

   Student ID  Course ID  Marks     s
0           1         10    100  
1           2        400    200     2
2           3         30    300     2
3           3         30    300     2
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

If you only want to apply this logic per Student ID, which might make more sense to avoid "leaking" values from one student to another rather use groupby.ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .groupby(df1['Student ID']).ffill()
            .convert_dtypes() # optional
           )

Or:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'].gt(df1['Marks'])
                   .groupby(df1['Student ID']).cummax())
            .convert_dtypes() # optional
           )

Output:

   Student ID  Course ID  Marks     s
0           1         10    100  
1           2        400    200     2
2           3         30    300  
3           3         30    300  
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

python pandas Problem repeating the previous value

Answers (1)

Related Questions

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN