Replacing Values in Dataframe Based on Condition

Question

I am using a Dataframe in python that has a column of percentages. I would like to replace values that are greater than 50% with 'Likely' and less than with 'Not-Likely'.

Here are the options I found:

df.apply
df.iterrows
df.where

This works for the df.iterrows:

for index, row in df.iterrows():
if row['Chance']>0.50:
    df.loc[index, 'Chance']='Likely'
else:
    df.loc[index, 'Chance']='Not-Likely'

However, I have read that this is not an optimal way of 'updating' values.

How would you do this using the other methods and which one would you recommend? Also, if you know any other methods, please share! Thanks

chitown88 · Accepted Answer

Give this a shot.

import numpy as np

df['Chance'] = np.where(df['Chance'] > 0.50, 'Likely', 'Not-Likely')

This will however make anything = to .50 as 'Not-Likely'

Just as a side note, .itertuples() is said to be about 10x faster than .iterrows(), and zip about 100x faster.

Replacing Values in Dataframe Based on Condition

Answers (1)

Related Questions