Numpy.where evaluating as True when condition is False

Question

I'm currently experiencing some unexpected behaviour in numpy. I am trying to add a column to a DataFrame which does some math on two other columns. These columns also contain a few strings of 'N/A'.

import pandas as pd
import numpy as np

my_list = []
my_list.append({'Value A':1, 'Value B':2})
my_list.append({'Value A':6, 'Value B':4})
my_list.append({'Value A':7, 'Value B':5})
my_list.append({'Value A':'N/A', 'Value B':6})
my_list.append({'Value A':12, 'Value B':10})
my_list.append({'Value A':2, 'Value B':2})
my_list.append({'Value A':9, 'Value B':'N/A'})
my_list.append({'Value A':8, 'Value B':3})
my_list.append({'Value A':22, 'Value B':6})

my_df = pd.DataFrame(my_list)

I then try to do a np.where() statement on this. First I check that, before I do any math, that both values are not 'N/A' because I convert them to floats if the condition is met:

my_df['New'] = np.where((my_df['Value A'].str != 'N/A') & 
                        (my_df['Value B'].str != 'N/A'),
                        my_df['Value A'].astype(float) - my_df['Value B'].astype(float),
                        'N/A')

However when this is ran, I get an error on the numpy.where:

ValueError: could not convert string to float: N/A

I was under the impression that the conversion should not have even taken place, given that the condition should have failed when one of the values were 'N/A'.

Could anyone share any insight?

gmds · Accepted Answer

All the arguments to Python functions, in general, are evaluated before the function is called. The behaviour you want would be present in a for loop, but that would be slow and ugly.

Instead, you should use pd.to_numeric:

converted = my_df[['Value A', 'Value B']].transform(pd.to_numeric, errors='coerce')
result = converted['Value A'] - converted['Value B']

print(result)

filled_result = result.fillna('N/A')

print(filled_result)

Output:

0    -1.0
1     2.0
2     2.0
3     NaN
4     2.0
5     0.0
6     NaN
7     5.0
8    16.0
dtype: float64
0     -1
1      2
2      2
3    N/A
4      2
5      0
6    N/A
7      5
8     16
dtype: object

Numpy.where evaluating as True when condition is False

Answers (1)

Related Questions