Reputation: 361
so I have a dictionary with a bunch of keys and data frames
df_dict = {'grade_1_math': grade_1_class_2029,
'grade_2_eng': grade_2_class_2029,
'grade_3_math': grade_3_class_2029,
'grade_4_eng': grade_4_class_2029,
'grade_5_math': grade_5_class_2029}
every data frame has a string column called 'Course' that either will or won't have the word 'math' in it
I am trying to create a new column called 'Subject' which evaluates 'Course' and fills itself with either Mathematics or Language Arts
for key, val in df_dict.items():
val['Subject'] = np.where('math' in val['Course'], 'Language Arts', 'Mathematics')
but it comes out wrong every time. Am I doing something wrong? I would like to keep this in a dictionary loop since I do many other things as well.
@laurent asked for before and after of grade_1_class_2029.head().to_dict()
Before:
{'Course': {0: 'English 10',
1: 'English 10',
2: 'English 10',
3: 'English 10',
4: 'English 10'},
'Current Grade Level': {0: '12', 1: '12', 2: '12', 3: '11', 4: '12'},
'F1 Percent': {0: <NA>, 1: <NA>, 2: <NA>, 3: <NA>, 4: <NA>},
'Stored Grade Level': {0: '10', 1: '10', 2: '10', 3: '10', 4: '10'},
'Student_ID': {0: '20741', 1: '20722', 2: '20583', 3: '21111', 4: '20725'},
'Y1 Percent': {0: '81.2', 1: '83', 2: '79.5', 3: '59.6', 4: '88.3'}}
After:
{'Course': {0: 'English 10',
1: 'English 10',
2: 'English 10',
3: 'English 10',
4: 'English 10'},
'Current Grade Level': {0: '12', 1: '12', 2: '12', 3: '11', 4: '12'},
'F1 Percent': {0: <NA>, 1: <NA>, 2: <NA>, 3: <NA>, 4: <NA>},
'Stored Grade Level': {0: '10', 1: '10', 2: '10', 3: '10', 4: '10'},
'Student_ID': {0: '20741', 1: '20722', 2: '20583', 3: '21111', 4: '20725'},
'Subject': {0: 'Mathematics',
1: 'Mathematics',
2: 'Mathematics',
3: 'Mathematics',
4: 'Mathematics'},
'Y1 Percent': {0: '81.2', 1: '83', 2: '79.5', 3: '59.6', 4: '88.3'}}
Upvotes: 2
Views: 44
Reputation: 71689
Cause of the problem:
val['Course']
is a pandas series so the condition math in va['Course']
will always evaluate to False
thus in the np.where
clause the choice Mathematics
is always selected.
How to fix the problem?
In order to fix the problem, you have to check the occurrence of word math
in every string from the column Course
and then use np.where
to decide which choice should get selected for the corresponding string
for key, val in df_dict.items():
mask = val['Course'].str.contains('math', case=False)
val['Subject'] = np.where(mask, 'Mathematics', 'Language Arts')
Result for grade_1_class_2029
>>> grade_1_class_2029
Course Current Grade Level F1 Percent Stored Grade Level Student_ID Y1 Percent Subject
0 English 10 12 NaN 10 20741 81.2 Language Arts
1 English 10 12 NaN 10 20722 83 Language Arts
2 English 10 12 NaN 10 20583 79.5 Language Arts
3 English 10 11 NaN 10 21111 59.6 Language Arts
4 English 10 12 NaN 10 20725 88.3 Language Arts
Upvotes: 1