Leo
Leo

Reputation: 1168

Pandas: Replace col values before first postive with 0?

I have a dataframe df:

             A  B  C  D
0  6683.214355  0  0  0
1  6745.376465  0  0 -3
2  6786.318359 -1  0  2
3     4.000000  3  4  1
4     5.000000 -2  0 -2

For each column, I want to replace all values before the first positive, with 0, expected result:

             A  B  C  D
0  6683.214355  0  0  0
1  6745.376465  0  0  0 #replaced -3 colD with 0
2  6786.318359  0  0  2 #replaced -1 col B with 0
3     4.000000  3  4  1
4     5.000000 -2  0 -2

I have achieved the correct answer for this trial dataframe:

def remove_initial_sell (col):
  col [:col.gt(0).idxmax()]= 0 
  return col

print (df.apply(remove_initial_sell))

Out: 
             A  B  C  D
0  6683.214355  0  0  0
1  6745.376465  0  0  0
2  6786.318359  0  0  2
3     4.000000  3  4  1
4     5.000000 -2  0 -2

However it doesn't work with datasets that have dates as indexs (which is my original dataset format):

Date = ['2018-03-14 ', '2018-03-15 ', '2018-03-16 ']
B=pd.Series([ 6683.214355,6745.376465, 6786.318359], index=Date)
B.index = pd.to_datetime(B.index)
remove_initial_sell(B)

Out: 

2018-03-14       0.000000 #didnt expect to replace the value
2018-03-15    6745.376465
2018-03-16    6786.318359

WHy isn't it working now with dates as index?

Below is the full code:

#@title Test removing elemnt if first negative
import pandas as pd  

#an exmaple dataframe 
df = pd.DataFrame({'A': [6683.214355,6745.376465, 6786.318359,4,5], 'B': [0, 0, -1, 3, -2], 'C' : [0, 0, 0, 4, 0], 'D': [0, -3, 2, 1, -2]} ) 

#replace all values before first positive value with 0
df_end = pd.DataFrame({'A': [ 6683.214355,6745.376465, 6786.318359,4,5], 'B': [0, 0, 0, 3, -2], 'C' : [0, 0, 0, 4, 0], 'D': [0, 0, 2, 1, -2]} ) 
df_end

def remove_initial_sell (col):
  col [:col.gt(0).idxmax()]= 0 
  return col
#it finds the right index 
print (df['A'].gt(0).idxmax())
print (df['B'].gt(0).idxmax())
print (df['C'].gt(0).idxmax())
print (df['D'].gt(0).idxmax())

#this is working 
print (remove_initial_sell(df['A']))
print (remove_initial_sell(df['B']))
print (remove_initial_sell(df['C']))
print (remove_initial_sell(df['D']))

#this is working
print (df.apply(remove_initial_sell))


#Not working here for some reasonwith this test 
Date = ['2018-03-14 ', '2018-03-15 ', '2018-03-16 ']
B=pd.Series([ 6683.214355,6745.376465, 6786.318359], index=Date)
B.index = pd.to_datetime(B.index)
print (remove_initial_sell(B))

Upvotes: 1

Views: 100

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150745

loc access with date index is a bit different than range index. More specifically:

df.loc[:'2020-01-01']

will include row with index 2020-01-01 if exists, while

df.loc[:3]

will not include row with index 3.

A way to go around this is to use iloc which behaves similar to numpy slicing:

def remove_initial_sell (col):
    idx = np.argmax(col.gt(0))
    if idx>0: col.iloc [idx-1]= 0 
    return col

for col in df.columns:
    remove_initial_sell(df[col])

Output:

                      A  B  C  D
2018-03-14  6683.214355  0  0  0
2018-03-15  6745.376465  0  0  0
2018-03-16  6786.318359  0  0  2
2018-03-17     4.000000  3  4  1
2018-03-18     5.000000 -2  0 -2

Upvotes: 1

Andy L.
Andy L.

Reputation: 25239

You don't need apply to achieve this. Just use cummax and where on the whole dataframe

df_final = df.where(df.gt(0).cummax(),0)

Out[129]:
             A  B  C  D
0  6683.214355  0  0  0
1  6745.376465  0  0  0
2  6786.318359  0  0  2
3     4.000000  3  4  1
4     5.000000 -2  0 -2

On datetime index:

In [131]: B.where(B.gt(0).cummax(),0)
Out[131]:
2018-03-14    6683.214355
2018-03-15    6745.376465
2018-03-16    6786.318359
dtype: float64

If you prefer a function:

def func(a_df):
    return a_df.where(a_df.gt(0).cummax(), 0)

and call

df_final = func(df)
B_final = func(B)

Upvotes: 1

Related Questions