Reputation: 1168
I have a dataframe df:
A B C D
0 6683.214355 0 0 0
1 6745.376465 0 0 -3
2 6786.318359 -1 0 2
3 4.000000 3 4 1
4 5.000000 -2 0 -2
For each column, I want to replace all values before the first positive, with 0, expected result:
A B C D
0 6683.214355 0 0 0
1 6745.376465 0 0 0 #replaced -3 colD with 0
2 6786.318359 0 0 2 #replaced -1 col B with 0
3 4.000000 3 4 1
4 5.000000 -2 0 -2
I have achieved the correct answer for this trial dataframe:
def remove_initial_sell (col):
col [:col.gt(0).idxmax()]= 0
return col
print (df.apply(remove_initial_sell))
Out:
A B C D
0 6683.214355 0 0 0
1 6745.376465 0 0 0
2 6786.318359 0 0 2
3 4.000000 3 4 1
4 5.000000 -2 0 -2
However it doesn't work with datasets that have dates as indexs (which is my original dataset format):
Date = ['2018-03-14 ', '2018-03-15 ', '2018-03-16 ']
B=pd.Series([ 6683.214355,6745.376465, 6786.318359], index=Date)
B.index = pd.to_datetime(B.index)
remove_initial_sell(B)
Out:
2018-03-14 0.000000 #didnt expect to replace the value
2018-03-15 6745.376465
2018-03-16 6786.318359
WHy isn't it working now with dates as index?
Below is the full code:
#@title Test removing elemnt if first negative
import pandas as pd
#an exmaple dataframe
df = pd.DataFrame({'A': [6683.214355,6745.376465, 6786.318359,4,5], 'B': [0, 0, -1, 3, -2], 'C' : [0, 0, 0, 4, 0], 'D': [0, -3, 2, 1, -2]} )
#replace all values before first positive value with 0
df_end = pd.DataFrame({'A': [ 6683.214355,6745.376465, 6786.318359,4,5], 'B': [0, 0, 0, 3, -2], 'C' : [0, 0, 0, 4, 0], 'D': [0, 0, 2, 1, -2]} )
df_end
def remove_initial_sell (col):
col [:col.gt(0).idxmax()]= 0
return col
#it finds the right index
print (df['A'].gt(0).idxmax())
print (df['B'].gt(0).idxmax())
print (df['C'].gt(0).idxmax())
print (df['D'].gt(0).idxmax())
#this is working
print (remove_initial_sell(df['A']))
print (remove_initial_sell(df['B']))
print (remove_initial_sell(df['C']))
print (remove_initial_sell(df['D']))
#this is working
print (df.apply(remove_initial_sell))
#Not working here for some reasonwith this test
Date = ['2018-03-14 ', '2018-03-15 ', '2018-03-16 ']
B=pd.Series([ 6683.214355,6745.376465, 6786.318359], index=Date)
B.index = pd.to_datetime(B.index)
print (remove_initial_sell(B))
Upvotes: 1
Views: 100
Reputation: 150745
loc
access with date index is a bit different than range index. More specifically:
df.loc[:'2020-01-01']
will include row with index 2020-01-01
if exists, while
df.loc[:3]
will not include row with index 3.
A way to go around this is to use iloc
which behaves similar to numpy slicing:
def remove_initial_sell (col):
idx = np.argmax(col.gt(0))
if idx>0: col.iloc [idx-1]= 0
return col
for col in df.columns:
remove_initial_sell(df[col])
Output:
A B C D
2018-03-14 6683.214355 0 0 0
2018-03-15 6745.376465 0 0 0
2018-03-16 6786.318359 0 0 2
2018-03-17 4.000000 3 4 1
2018-03-18 5.000000 -2 0 -2
Upvotes: 1
Reputation: 25239
You don't need apply
to achieve this. Just use cummax
and where
on the whole dataframe
df_final = df.where(df.gt(0).cummax(),0)
Out[129]:
A B C D
0 6683.214355 0 0 0
1 6745.376465 0 0 0
2 6786.318359 0 0 2
3 4.000000 3 4 1
4 5.000000 -2 0 -2
On datetime index:
In [131]: B.where(B.gt(0).cummax(),0)
Out[131]:
2018-03-14 6683.214355
2018-03-15 6745.376465
2018-03-16 6786.318359
dtype: float64
If you prefer a function:
def func(a_df):
return a_df.where(a_df.gt(0).cummax(), 0)
and call
df_final = func(df)
B_final = func(B)
Upvotes: 1