Reputation: 481
I'm trying to do something very simple in pandas and I am obviously missing something. The goal is to take the values in column a and to change them to either 1.0 or 0.0 depending on whether the original value was greater than 4.0.
I thought I understood the required syntax by looking at Replacing column values in a pandas DataFrame
import pandas as pd
df = pd.DataFrame({'a': [3.5, 4.0, 4.1], 'b': [2.2, 3.0, 4.0]})
ex = pd.DataFrame({'a': [0.0, 1.0, 1.0], 'b': [2.2, 3.0, 4.0]})
print("input data")
print(df)
print("expected result")
print(ex)
# df.loc[ < row selection >, < columnselection >]
df.loc[df.a >= 4.0, 'a'] = 1.0
df.loc[df.a < 4.0, 'a'] = 0.0
print("actual result")
print(df)
df = pd.DataFrame({'a': [3.5, 4.0, 4.1], 'b': [2.2, 3.0, 4.0]})
print("retry using .abs()")
df.loc[df.a.abs() >= 4.0, 'a'] = 1.0
df.loc[df.a.abs() < 4.0, 'a'] = 0.0
print("actual result")
print(df)
Here's the matching output:
input data
a b
0 3.5 2.2
1 4.0 3.0
2 4.1 4.0
expected result
a b
0 0.0 2.2
1 1.0 3.0
2 1.0 4.0
actual result
a b
0 0.0 2.2
1 0.0 3.0
2 0.0 4.0
retry using .abs()
actual result
a b
0 0.0 2.2
1 0.0 3.0
2 0.0 4.0
I was expecting rows 2 and 3 to be set to 1.0 but instead all of the values are 0.0
Thanks for your help.
Upvotes: 1
Views: 118
Reputation: 323226
Your 1st condition is overwrite the original value ,in that case , value greater than 4 become 1 , so that will made the second condition a<4.0
all become True. You should do it at one time
df.a = df.a.ge(4.0).astype(int)
df
a b
0 0 2.2
1 1 3.0
2 1 4.0
More Info
df.loc[df.a >= 4.0, 'a'] = 1.0
df
a b
0 3.5 2.2
1 1.0 3.0
2 1.0 4.0
Then ,
df.a<4
0 True
1 True
2 True
Name: a, dtype: bool
So that it overwrite all value to 0 with df.loc[df.a < 4.0, 'a'] = 0.0
Upvotes: 1