Better way of creating Pandas Dataframe based on condition

Question

I have a task to create Dataframes based on conditions within other Dataframes.

I've been doing it the same way for about a week now, but I was curious if there was a better way. I stumbled across This Example. Now i know the example he is using is creating a separate column based on conditions, but it made me wonder if my code could be improved.

Here is a shortened version of the code in link for ease of use:

import pandas as pd
import numpy as np

raw_data = {'student_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'test_score': [76, 88, 84, 67, 53, 96, 64, 91, 77, 73, 52, np.NaN]}
df = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])

print(df)

grades = []

for row in df['test_score']:
    if row > 59:
        grades.append('Pass')
    else:
        grades.append('fail')
df['grades'] = grades
print(df)

   student_name  test_score grades
0        Miller        76.0   Pass
1      Jacobson        88.0   Pass
2           Ali        84.0   Pass
3        Milner        67.0   Pass
4         Cooze        53.0   fail
5         Jacon        96.0   Pass
6        Ryaner        64.0   Pass
7          Sone        91.0   Pass
8         Sloan        77.0   Pass
9         Piger        73.0   Pass
10        Riani        52.0   fail
11          Ali         NaN   fail

Going along with the above example, if i did not want to make a "Grades" Column, but instead wanted a dataframe of all the people who passed. I personally would do this:

pass_df = df[df['test_score'] > 59]
print(pass_df)

Is there a better way of doing this?

miradulo · Accepted Answer

The new column can be assigned more nicely using np.where.

df['grades'] = np.where(df.test_score > 59, 'Pass', 'fail')

As for indexing where the test score is greater than 59 your approach is standard, however should you intend on treating the result as its own DataFrame you will want to call .copy().

Demo

>>> df['grades'] = np.where(df.test_score > 59, 'Pass', 'fail')

>>> df
   student_name  test_score grades
0        Miller        76.0   Pass
1      Jacobson        88.0   Pass
2           Ali        84.0   Pass
3        Milner        67.0   Pass
4         Cooze        53.0   fail
5         Jacon        96.0   Pass
6        Ryaner        64.0   Pass
7          Sone        91.0   Pass
8         Sloan        77.0   Pass
9         Piger        73.0   Pass
10        Riani        52.0   fail
11          Ali         NaN   fail

Better way of creating Pandas Dataframe based on condition

Answers (1)

Related Questions