Reputation: 14641
I have the following dataframe:
actual_credit min_required_credit
0 0.3 0.4
1 0.5 0.2
2 0.4 0.4
3 0.2 0.3
I need to add a column indicating where actual_credit >= min_required_credit. The result would be:
actual_credit min_required_credit result
0 0.3 0.4 False
1 0.5 0.2 True
2 0.4 0.4 True
3 0.1 0.3 False
I am doing the following:
df['result'] = abs(df['actual_credit']) >= abs(df['min_required_credit'])
However the 3rd row (0.4 and 0.4) is constantly resulting in False. After researching this issue at various places including: What is the best way to compare floats for almost-equality in Python? I still can't get this to work. Whenever the two columns have an identical value, the result is False which is not correct.
I am using python 3.3
Upvotes: 42
Views: 63339
Reputation: 167
@EdChum's answer works great, but using the pandas.DataFrame.round function is another clean option that works well without the use of numpy
.
df = pd.DataFrame( # adding a small difference at the thousandths place to reproduce the issue
data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
columns=['actual_credit', 'min_required_credit'])
df['result'] = df['actual_credit'].round(1) >= df['min_required_credit'].round(1)
print(df)
actual_credit min_required_credit result
0 0.3 0.400 False
1 0.5 0.200 True
2 0.4 0.401 True
3 0.2 0.300 False
You might consider using round()
to more permanently edit your dataframe, depending if you desire that precision or not. In this example, it seems like the OP suggests this is probably just noise and is just causing confusion.
df = pd.DataFrame( # adding a small difference at the thousandths place to reproduce the issue
data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
columns=['actual_credit', 'min_required_credit'])
df = df.round(1)
df['result'] = df['actual_credit'] >= df['min_required_credit']
print(df)
actual_credit min_required_credit result
0 0.3 0.4 False
1 0.5 0.2 True
2 0.4 0.4 True
3 0.2 0.3 False
Upvotes: 10
Reputation: 14958
In general numpy
Comparison functions work well with pd.Series
and allow for element-wise comparisons:
isclose
, allclose
, greater
, greater_equal
, less
, less_equal
etc.
In your case greater_equal
would do:
df['result'] = np.greater_equal(df['actual_credit'], df['min_required_credit'])
or alternatively, as proposed, using pandas.ge
(alternatively le
, gt
etc.):
df['result'] = df['actual_credit'].ge(df['min_required_credit'])
The risk with or
ing with ge
(as mentioned above) is that e.g. comparing 3.999999999999
and 4.0
might return True
which might not necessarily be what you want.
Upvotes: 1
Reputation: 393963
Due to imprecise float comparison you can or
your comparison with np.isclose
, isclose
takes a relative and absolute tolerance param so the following should work:
df['result'] = df['actual_credit'].ge(df['min_required_credit']) | np.isclose(df['actual_credit'], df['min_required_credit'])
Upvotes: 54
Reputation: 500227
Use pandas.DataFrame.abs()
instead of the built-in abs()
:
df['result'] = df['actual_credit'].abs() >= df['min_required_credit'].abs()
Upvotes: -4