Reputation: 499
I am trying to create a new column in a pandas dataframe using a very complex if statement (I have simplified it for the sake of clarity below). I keep getting the error: ("'float' object has no attribute 'shift'", 'occurred at index 0'). I have looked around stack/the internet and have not come up with a great answer for my solution. Some answers involve taking the .shift out of a function, however, I need to have it within a function due to the complex nature of the if statement I am writing.
I have attached an image below detailing what I ultimately want the function to do. I believe it explains it better than I could describe it with words. Any help or guidance would be greatly appreciated.
Please let me know if you have any questions or if I can clarify anything!
Code example
df=pd.read_csv(file)
def ubk (df):
x = df['k_calc'].shift(1)
if x <90 :
return 1
elif x >90:
return 2
df['test'] = df.apply(ubk,axis = 1)
Upvotes: 1
Views: 5490
Reputation: 1514
Why don't you just do this:
df['test'] = 1+(df['k_calc'].shift(1)>=90).astype(int)
The error you get is because you might be misunderstanding what apply does.
When you do df.apply(ubk,axis = 1)
, pandas will apply ubk to every row in your dataframe. As a result, in your function call, df
is not your original dataframe but one of its rows. So when you do x = df['k_calc'].shift(1)
, since df['k_calc']
is a single entry (a float), pandas complains: he doesn't know any shift()
method for float.
Upvotes: 1
Reputation: 25269
You may pass additional parameter to apply
if you want. In this case you may pass the main df
and your ubk
handles/processes it as you want. I don't know exact purpose of your ubk
, so I just modify ubk
to accomplish what you describe for column test
. It seems your logic is not efficient, but you may have your own reason to use it. So, it is up to you.
sample data:
In [301]: df
Out[301]:
lowest_low k_calc d_cal
0 9.07 75.0000 NaN
1 9.07 79.7297 NaN
2 9.07 92.5675 NaN
3 9.07 66.2116 78.3772
function and call apply
to create test
columns with condition: if previous cell of k_calc < 90
returns 1, > 90
returns 2 as follows
def ubk (s, m_df):
x = m_df['k_calc'].shift(1)[s.name]
if x <90 :
return 1
elif x >90:
return 2
df['test'] = df.apply(ubk, axis=1, args=(df,))
Out[304]:
lowest_low k_calc d_cal test
0 9.07 75.0000 NaN NaN
1 9.07 79.7297 NaN 1.0
2 9.07 92.5675 NaN 1.0
3 9.07 66.2116 78.3772 2.0
Upvotes: 3