dfahsjdahfsudaf
dfahsjdahfsudaf

Reputation: 481

Replacing certain values in column with string

This is my current data frame:

sports_gpa  music_gpa Activity Sport
2            3         nan       nan
0            2         nan       nan
3            3.5       nan       nan
2             1        nan       nan

I have the following condition:

If the 'sports_gpa' is greater than 0 and the 'music_gpa' is greater than the 'sports_gpa', fill the the 'Activity' column with the 'sport_gpa' and fill the 'Sport' column with the str 'basketball'.

Expected output:

sports_gpa  music_gpa Activity Sport
2            3         2       basketball
0            2         nan       nan
3            3.5       3        basketball 
2            1         nan      nan

To do this I would use the following statement...

df['Activity'], df['Sport'] = np.where(((df['sports_gpa'] > 0) & (df['music_gpa'] > df['sports_gpa'])), (df['sport_gpa'],'basketball'), (df['Activity'], df['Sport']))

This of course gives an error that operands could not be broadcast together with shapes.

To fix this I could add a column to the data frame..

df.loc[:,'str'] = 'basketball'
df['Activity'], df['Sport'] = np.where(((df['sports_gpa'] > 0) & (df['music_gpa'] > df['sports_gpa'])), (df['sport_gpa'],df['str']), (df['Activity'], df['Sport']))

This gives me my expected output.

I am wondering if there is a way to fix this error without having to create a new column in order to add the str value 'basketball' to the 'Sport' column in the np.where statement.

Upvotes: 1

Views: 58

Answers (2)

ansev
ansev

Reputation: 30920

Use np.where + Series.fillna:

where=df['sports_gpa'].ne(0)&(df['sports_gpa']<df['music_gpa'])
df['Activity'], df['Sport'] = np.where(where, (df['sports_gpa'],df['Sport'].fillna('basketball')), (df['Activity'], df['Sport']))

You can also use Series.where + Series.mask:

df['Activity']=df['sports_gpa'].where(where)
df['Sport']=df['Sport'].mask(where,'basketball')
print(df)

   sports_gpa  music_gpa  Activity       Sport
0           2        3.0       2.0  basketball
1           0        2.0       NaN         NaN
2           3        3.5       3.0  basketball
3           2        1.0       NaN         NaN

Upvotes: 1

dfahsjdahfsudaf
dfahsjdahfsudaf

Reputation: 481

Just figured out I could do:

   df['Activity'], df['Sport'] = np.where(((df['sports_gpa'] > 0) & (df['music_gpa'] > df['sports_gpa'])), (df['sports_gpa'],df['Sport'].astype(str).replace({"nan": "basketball"})), (df['Activity'], df['Sport']))

Upvotes: 0

Related Questions