elcunyado
elcunyado

Reputation: 361

Conditionally change a new field's value based on another field in python during a dictionary loop

so I have a dictionary with a bunch of keys and data frames

df_dict = {'grade_1_math': grade_1_class_2029,
'grade_2_eng': grade_2_class_2029,
'grade_3_math': grade_3_class_2029,
'grade_4_eng': grade_4_class_2029,
'grade_5_math': grade_5_class_2029}

every data frame has a string column called 'Course' that either will or won't have the word 'math' in it

I am trying to create a new column called 'Subject' which evaluates 'Course' and fills itself with either Mathematics or Language Arts

for key, val in df_dict.items():
  val['Subject'] = np.where('math' in val['Course'], 'Language Arts', 'Mathematics')

but it comes out wrong every time. Am I doing something wrong? I would like to keep this in a dictionary loop since I do many other things as well.

@laurent asked for before and after of grade_1_class_2029.head().to_dict()

Before:

{'Course': {0: 'English 10',
  1: 'English 10',
  2: 'English 10',
  3: 'English 10',
  4: 'English 10'},
 'Current Grade Level': {0: '12', 1: '12', 2: '12', 3: '11', 4: '12'},
 'F1 Percent': {0: <NA>, 1: <NA>, 2: <NA>, 3: <NA>, 4: <NA>},
 'Stored Grade Level': {0: '10', 1: '10', 2: '10', 3: '10', 4: '10'},
 'Student_ID': {0: '20741', 1: '20722', 2: '20583', 3: '21111', 4: '20725'},
 'Y1 Percent': {0: '81.2', 1: '83', 2: '79.5', 3: '59.6', 4: '88.3'}}

After:

{'Course': {0: 'English 10',
  1: 'English 10',
  2: 'English 10',
  3: 'English 10',
  4: 'English 10'},
 'Current Grade Level': {0: '12', 1: '12', 2: '12', 3: '11', 4: '12'},
 'F1 Percent': {0: <NA>, 1: <NA>, 2: <NA>, 3: <NA>, 4: <NA>},
 'Stored Grade Level': {0: '10', 1: '10', 2: '10', 3: '10', 4: '10'},
 'Student_ID': {0: '20741', 1: '20722', 2: '20583', 3: '21111', 4: '20725'},
 'Subject': {0: 'Mathematics',
  1: 'Mathematics',
  2: 'Mathematics',
  3: 'Mathematics',
  4: 'Mathematics'},
 'Y1 Percent': {0: '81.2', 1: '83', 2: '79.5', 3: '59.6', 4: '88.3'}}

Upvotes: 2

Views: 44

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Cause of the problem:

val['Course'] is a pandas series so the condition math in va['Course'] will always evaluate to False thus in the np.where clause the choice Mathematics is always selected.

How to fix the problem?

In order to fix the problem, you have to check the occurrence of word math in every string from the column Course and then use np.where to decide which choice should get selected for the corresponding string

for key, val in df_dict.items():
    mask = val['Course'].str.contains('math', case=False)
    val['Subject'] = np.where(mask, 'Mathematics', 'Language Arts')

Result for grade_1_class_2029

>>> grade_1_class_2029

       Course Current Grade Level  F1 Percent Stored Grade Level Student_ID Y1 Percent        Subject
0  English 10                  12         NaN                 10      20741       81.2  Language Arts
1  English 10                  12         NaN                 10      20722         83  Language Arts
2  English 10                  12         NaN                 10      20583       79.5  Language Arts
3  English 10                  11         NaN                 10      21111       59.6  Language Arts
4  English 10                  12         NaN                 10      20725       88.3  Language Arts

Upvotes: 1

Related Questions