Reputation: 77
I have a DataFrame like this:
students = {'ID': [2, 3, 5, 7, 11, 13],
'Name':['John','Jane','Sam','James','Stacy','Mary'],
'Gender':['M','F','F','M','F','F'],
'school_name':['College2','College2','College10','College2','College2','College2'],
'grade':['9th','10th','9th','9th','8th','5th'],
'math_score':[90,89,88,89,89,90],
'art_score':[90,89,89,78,90,94]}
students_df = pd.DataFrame(students)
Can I use the loc method on the students_df to select all the math_scores and art_scores from the 9th grade at College2 and replace them with NaN? Is there a clean way of doing this without breaking the process into two parts: one for the subsetting and the other for the replacing?
I tried to select this way:
students_df.loc[(students_df['school_name'] == 'College2') & (students_df['grade'] == "9th"),['grade','school_name','math_score','art_score']]
I replaced this way:
students_df['math_score'] = np.where((students_df['school_name']=='College2') & (students_df['grade']=='9th'), np.NaN, students_df['math_score'])
Can I achieve the same thing in a much cleaner and efficient way using loc and np.NaN?
Upvotes: 2
Views: 583
Reputation: 862731
Select columns for replace missing values first and set NaN
:
students_df.loc[(students_df['school_name'] == 'College2') & (students_df['grade'] == "9th"),['math_score','art_score']] = np.nan
print (students_df)
ID Name Gender school_name grade math_score art_score
0 2 John M College2 9th NaN NaN
1 3 Jane F College2 10th 89.0 89.0
2 5 Sam F College10 9th 88.0 89.0
3 7 James M College2 9th NaN NaN
4 11 Stacy F College2 8th 89.0 90.0
5 13 Mary F College2 5th 90.0 94.0
Upvotes: 1