Reputation: 741
I'm currently experiencing some unexpected behaviour in numpy. I am trying to add a column to a DataFrame which does some math on two other columns. These columns also contain a few strings of 'N/A'.
import pandas as pd
import numpy as np
my_list = []
my_list.append({'Value A':1, 'Value B':2})
my_list.append({'Value A':6, 'Value B':4})
my_list.append({'Value A':7, 'Value B':5})
my_list.append({'Value A':'N/A', 'Value B':6})
my_list.append({'Value A':12, 'Value B':10})
my_list.append({'Value A':2, 'Value B':2})
my_list.append({'Value A':9, 'Value B':'N/A'})
my_list.append({'Value A':8, 'Value B':3})
my_list.append({'Value A':22, 'Value B':6})
my_df = pd.DataFrame(my_list)
I then try to do a np.where() statement on this. First I check that, before I do any math, that both values are not 'N/A' because I convert them to floats if the condition is met:
my_df['New'] = np.where((my_df['Value A'].str != 'N/A') &
(my_df['Value B'].str != 'N/A'),
my_df['Value A'].astype(float) - my_df['Value B'].astype(float),
'N/A')
However when this is ran, I get an error on the numpy.where:
ValueError: could not convert string to float: N/A
I was under the impression that the conversion should not have even taken place, given that the condition should have failed when one of the values were 'N/A'.
Could anyone share any insight?
Upvotes: 0
Views: 451
Reputation: 19885
All the arguments to Python functions, in general, are evaluated before the function is called. The behaviour you want would be present in a for
loop, but that would be slow and ugly.
Instead, you should use pd.to_numeric
:
converted = my_df[['Value A', 'Value B']].transform(pd.to_numeric, errors='coerce')
result = converted['Value A'] - converted['Value B']
print(result)
filled_result = result.fillna('N/A')
print(filled_result)
Output:
0 -1.0
1 2.0
2 2.0
3 NaN
4 2.0
5 0.0
6 NaN
7 5.0
8 16.0
dtype: float64
0 -1
1 2
2 2
3 N/A
4 2
5 0
6 N/A
7 5
8 16
dtype: object
Upvotes: 2