I'm trying to calculate the value of a given column based on a condition. The base dataframe looks like this (assuming that cols a and b are coming from a previous manipulation, hence the insertion): import pandas as pd import numpy as np df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]}) df.insert(1, 'calculated', np.nan) Now, I'm trying to calculate the value of 'calculated' based on 'a' and 'b'. I tried iterating over the dataframe rows, but the 'calculated' column does not get calculated... for index, row in df.iterrows(): if row['a']>2: row['calculated'] = row['b']*2 else: row['calculated'] = row['b'] Using df.apply does not seem to do the trick because all examples I found where using lambdas (how do you pass values of a and return data to calculated with a lambda?) I managed to do it with the following code: df.loc[df['a'] > 2, 'calculated'] = df['b']*2 df.loc[df['a'] <= 2, 'calculated'] = df['b'] However, this code is quite 'error prone' and is kind of hard to read. Is there a 'lot cleaner' way to achieve this? A way to add logic easily. something like? def get_calculated_value(row): if row['a'] > 2: row['calculated'] = row['b'] * 2 else: row['calculated'] = row['a'] df.apply(get_calculated_value())

Reputation: 2143

calculating the value of 1 column based on conditions on other columns

I'm trying to calculate the value of a given column based on a condition.

The base dataframe looks like this (assuming that cols a and b are coming from a previous manipulation, hence the insertion):

import pandas as pd
import numpy as np

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]})

df.insert(1, 'calculated', np.nan)

Now, I'm trying to calculate the value of 'calculated' based on 'a' and 'b'.

I tried iterating over the dataframe rows, but the 'calculated' column does not get calculated...

for index, row in df.iterrows():
    if row['a']>2:
        row['calculated'] = row['b']*2
    else:
        row['calculated'] = row['b']

Using df.apply does not seem to do the trick because all examples I found where using lambdas (how do you pass values of a and return data to calculated with a lambda?)

I managed to do it with the following code:

df.loc[df['a'] > 2, 'calculated'] = df['b']*2
df.loc[df['a'] <= 2, 'calculated'] = df['b']

However, this code is quite 'error prone' and is kind of hard to read.

Is there a 'lot cleaner' way to achieve this? A way to add logic easily.

something like?

def get_calculated_value(row):
  if row['a'] > 2:
    row['calculated'] = row['b'] * 2
  else:
    row['calculated'] = row['a']

df.apply(get_calculated_value())

Upvotes: 0

Answers (3)

Benoit de Menthière

Reputation: 743

There is a much faster way to do it using np.where:

df['calculated']=np.where(df.a>2,2*df.b,df.b)

Upvotes: 0

vb_rises

Reputation: 1907

You can use apply function with lambda. You don't need to assign 'calculated' column inside the function. Also, using apply(), you can add or modify conditions later on.

def myfunc(row):
    if row['a'] > 2:
        return row['b'] * 2
    else:
        return row['a']

df['calculated'] = df.apply(lambda x : myfunc(x), axis=1)

#output
df

    a   b   calculated
0   1   6   1
1   2   7   2
2   3   8   16
3   4   9   18
4   5   10  20

Upvotes: 1

Dev Khadka

Reputation: 5451

import pandas as pd
import numpy as np

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10]})

df['calculated'] = df["b"].where(df["b"]>2, df["b"]*2)
display(df)

Upvotes: 1

calculating the value of 1 column based on conditions on other columns

Answers (3)

Related Questions