E. Jaep
E. Jaep

Reputation: 2143

calculating the value of 1 column based on conditions on other columns

I'm trying to calculate the value of a given column based on a condition.

The base dataframe looks like this (assuming that cols a and b are coming from a previous manipulation, hence the insertion):

import pandas as pd
import numpy as np

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]})

df.insert(1, 'calculated', np.nan)

Now, I'm trying to calculate the value of 'calculated' based on 'a' and 'b'.

I tried iterating over the dataframe rows, but the 'calculated' column does not get calculated...

for index, row in df.iterrows():
    if row['a']>2:
        row['calculated'] = row['b']*2
    else:
        row['calculated'] = row['b']

result of iterrows

I managed to do it with the following code:

df.loc[df['a'] > 2, 'calculated'] = df['b']*2
df.loc[df['a'] <= 2, 'calculated'] = df['b']

However, this code is quite 'error prone' and is kind of hard to read.

Is there a 'lot cleaner' way to achieve this? A way to add logic easily.

something like?

def get_calculated_value(row):
  if row['a'] > 2:
    row['calculated'] = row['b'] * 2
  else:
    row['calculated'] = row['a']

df.apply(get_calculated_value())

Upvotes: 0

Views: 1273

Answers (3)

There is a much faster way to do it using np.where:

df['calculated']=np.where(df.a>2,2*df.b,df.b)

Upvotes: 0

vb_rises
vb_rises

Reputation: 1907

You can use apply function with lambda. You don't need to assign 'calculated' column inside the function. Also, using apply(), you can add or modify conditions later on.

def myfunc(row):
    if row['a'] > 2:
        return row['b'] * 2
    else:
        return row['a']

df['calculated'] = df.apply(lambda x : myfunc(x), axis=1)

#output
df

    a   b   calculated
0   1   6   1
1   2   7   2
2   3   8   16
3   4   9   18
4   5   10  20

Upvotes: 1

Dev Khadka
Dev Khadka

Reputation: 5451

import pandas as pd
import numpy as np

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]})

df['calculated'] = df["b"].where(df["b"]>2, df["b"]*2)
display(df)

Upvotes: 1

Related Questions