Reputation: 2143
I'm trying to calculate the value of a given column based on a condition.
The base dataframe looks like this (assuming that cols a and b are coming from a previous manipulation, hence the insertion):
import pandas as pd
import numpy as np
df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]})
df.insert(1, 'calculated', np.nan)
Now, I'm trying to calculate the value of 'calculated' based on 'a' and 'b'.
I tried iterating over the dataframe rows, but the 'calculated' column does not get calculated...
for index, row in df.iterrows():
if row['a']>2:
row['calculated'] = row['b']*2
else:
row['calculated'] = row['b']
df.apply
does not seem to do the trick because all examples I found where using lambdas (how do you pass values of a and return data to calculated with a lambda?)I managed to do it with the following code:
df.loc[df['a'] > 2, 'calculated'] = df['b']*2
df.loc[df['a'] <= 2, 'calculated'] = df['b']
However, this code is quite 'error prone' and is kind of hard to read.
Is there a 'lot cleaner' way to achieve this? A way to add logic easily.
something like?
def get_calculated_value(row):
if row['a'] > 2:
row['calculated'] = row['b'] * 2
else:
row['calculated'] = row['a']
df.apply(get_calculated_value())
Upvotes: 0
Views: 1273
Reputation: 743
There is a much faster way to do it using np.where:
df['calculated']=np.where(df.a>2,2*df.b,df.b)
Upvotes: 0
Reputation: 1907
You can use apply function with lambda. You don't need to assign 'calculated' column inside the function. Also, using apply(), you can add or modify conditions later on.
def myfunc(row):
if row['a'] > 2:
return row['b'] * 2
else:
return row['a']
df['calculated'] = df.apply(lambda x : myfunc(x), axis=1)
#output
df
a b calculated
0 1 6 1
1 2 7 2
2 3 8 16
3 4 9 18
4 5 10 20
Upvotes: 1
Reputation: 5451
import pandas as pd
import numpy as np
df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]})
df['calculated'] = df["b"].where(df["b"]>2, df["b"]*2)
display(df)
Upvotes: 1