reinhardt
reinhardt

Reputation: 2253

How to compare values in 2 columns and do assignment in a new column?

I have a df named data as follows:

    id  upper_ci    lower_ci    max_power_contractual
0   12858 60.19878860406808 49.827481214215204  0
1   12858 60.61189293066522 49.298784196530896  0
2   12858 60.34397624424309 49.718421137642885  70
3   12858 59.87472261936114 49.464255779713476  10
4   12858 60.2735279368527  49.41672240525131   0

I am trying to create a new column named up_threshold as follows:

I tried:

if (data['max_power_contractual'] in (0, np.nan)) or (data['max_power_contractual'] > data['upper_ci']):
    data['up_threshold'] = data['upper_ci']
elif (data['upper_ci'] > data['max_power_contractual'] == 0):
    data['up_threshold'] = data['max_power_contractual']

But it gives me the following error:

Traceback (most recent call last):

File "/home/cortex/.config/spyder-py3/temp.py", line 179, in data = cp_detection(data, threshold)

File "/home/cortex/.config/spyder-py3/temp.py", line 146, in cp_detection if data['max_power_contractual'] == 0:

File "/home/cortex/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in nonzero f"The truth value of a {type(self).name} is ambiguous. "

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can someone please tell me my mistake and how can I solve it?

Expected output:

    id  upper_ci    lower_ci    max_power_contractual   up_threshold
0   12858 60.19878860406808 49.827481214215204  0   60.19878860406808 (Since `max_power_contractual` value is 0)
1   12858 60.61189293066522 49.298784196530896  NaN  60.61189293066522 (Since `max_power_contractual` value is NaN)
2   12858 60.34397624424309 49.718421137642885  70   60.34397624424309 (Since `upper_ci < max_power_contractual`)
3   12858 59.87472261936114 49.464255779713476  10  10 (Since `upper_ci > max_power_contractual`)

Upvotes: 0

Views: 42

Answers (3)

bigbounty
bigbounty

Reputation: 17368

Not the efficient way but easier to understand

In [17]: def process(data):
    ...:     result = None
    ...:     if (data['max_power_contractual'] in (0, np.nan)) or (data['max_power_contractual'] > data['upper_ci']):
    ...:        result = data['upper_ci']
    ...:     elif (data['upper_ci'] > data['max_power_contractual']):
    ...:        result = data['max_power_contractual']
    ...:
    ...:     return result
    ...:

In [18]: df.apply(process, axis=1)
Out[18]:
0    60.198789
1    60.611893
2    60.343976
3    10.000000
4    60.273528
dtype: float64

In [19]: df["up_threshold"] = df.apply(process, axis=1)

In [20]: df
Out[20]:
      id   upper_ci   lower_ci  max_power_contractual  up_threshold
0  12858  60.198789  49.827481                      0     60.198789
1  12858  60.611893  49.298784                      0     60.611893
2  12858  60.343976  49.718421                     70     60.343976
3  12858  59.874723  49.464256                     10     10.000000
4  12858  60.273528  49.416722                      0     60.273528

Upvotes: 0

Balaji Ambresh
Balaji Ambresh

Reputation: 5037

Use np.select

import numpy as np

m1 = df.max_power_contractual.isin([np.NaN, 0])
m2 = df.max_power_contractual > df.upper_ci

df['up_threshold'] = np.select([m1, m2], [df.upper_ci, df.upper_ci], default=df.max_power_contractual)
print(df)

Output

      id   upper_ci   lower_ci  max_power_contractual  up_threshold
0  12858  60.198789  49.827481                      0     60.198789
1  12858  60.611893  49.298784                      0     60.611893
2  12858  60.343976  49.718421                     70     60.343976
3  12858  59.874723  49.464256                     10     10.000000
4  12858  60.273528  49.416722                      0     60.273528

Since we're using the same columns as choices, we can also use

df['up_threshold'] = np.where((m1 | m2), df.upper_ci, df.max_power_contractual)

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can use np.where to add the new column:

df['up_threshold'] = np.where(df['max_power_contractual'].fillna(0) == 0, df['upper_ci'],
    np.where(df['max_power_contractual'] > df['upper_ci'], df['upper_ci'], df['max_power_contractual'])
)

print(df)

Prints:

      id   upper_ci   lower_ci  max_power_contractual  up_threshold
0  12858  60.198789  49.827481                    0.0     60.198789
1  12858  60.611893  49.298784                    NaN     60.611893
2  12858  60.343976  49.718421                   70.0     60.343976
3  12858  59.874723  49.464256                   10.0     10.000000
4  12858  60.273528  49.416722                    0.0     60.273528

Upvotes: 2

Related Questions