P.H.
P.H.

Reputation: 481

Updating a pandas floating number based on value produces unexpected result

I'm trying to do something very simple in pandas and I am obviously missing something. The goal is to take the values in column a and to change them to either 1.0 or 0.0 depending on whether the original value was greater than 4.0.

I thought I understood the required syntax by looking at Replacing column values in a pandas DataFrame

import pandas as pd

df = pd.DataFrame({'a': [3.5, 4.0, 4.1], 'b': [2.2, 3.0, 4.0]})
ex = pd.DataFrame({'a': [0.0, 1.0, 1.0], 'b': [2.2, 3.0, 4.0]})
print("input data")
print(df)
print("expected result")
print(ex)

# df.loc[ < row selection >, < columnselection >]
df.loc[df.a >= 4.0, 'a'] = 1.0
df.loc[df.a < 4.0, 'a'] = 0.0
print("actual result")
print(df)
df = pd.DataFrame({'a': [3.5, 4.0, 4.1], 'b': [2.2, 3.0, 4.0]})

print("retry using .abs()")
df.loc[df.a.abs() >= 4.0, 'a'] = 1.0
df.loc[df.a.abs() < 4.0, 'a'] = 0.0
print("actual result")
print(df)

Here's the matching output:

input data
     a    b
0  3.5  2.2
1  4.0  3.0
2  4.1  4.0
expected result
     a    b
0  0.0  2.2
1  1.0  3.0
2  1.0  4.0
actual result
     a    b
0  0.0  2.2
1  0.0  3.0
2  0.0  4.0
retry using .abs()
actual result
     a    b
0  0.0  2.2
1  0.0  3.0
2  0.0  4.0

I was expecting rows 2 and 3 to be set to 1.0 but instead all of the values are 0.0

Thanks for your help.

Upvotes: 1

Views: 118

Answers (1)

BENY
BENY

Reputation: 323226

Your 1st condition is overwrite the original value ,in that case , value greater than 4 become 1 , so that will made the second condition a<4.0 all become True. You should do it at one time

df.a = df.a.ge(4.0).astype(int)
df
   a    b
0  0  2.2
1  1  3.0
2  1  4.0

More Info

df.loc[df.a >= 4.0, 'a'] = 1.0
df
     a    b
0  3.5  2.2
1  1.0  3.0
2  1.0  4.0

Then ,

df.a<4
0    True
1    True
2    True
Name: a, dtype: bool

So that it overwrite all value to 0 with df.loc[df.a < 4.0, 'a'] = 0.0

Upvotes: 1

Related Questions