clockworkss
clockworkss

Reputation: 45

Conditional Rolling Sum using filter on groupby group rows

I've been trying without success to find a way to create an "average_gain_up" in python and have gotten a bit stuck. Being new to groupby there is something of how it is treating functions that i've not managed to grasp so any intuition behind how to think through these types of problems would be helpful.

Problem: Create a rolling 14 day sum, only summing if the value is >0 .

new=pd.DataFrame([[1,-2,3,-2,4,5],['a','a','a','b','b','b']])
new= new.T #transposing into a friendly groupby format

#Group by a or b, filter to only have positive values and then sum rolling, we 
keep NAs to ensure the sum is ran over 14 values. 
groupby=new.groupby(1)[0].filter(lambda x: x>0,dropna=False).rolling(14).sum()

frame

Intended Sum Frame:

enter image description here

x.all()/len(x) result:

enter image description here

this throws a type error "the filter must return a boolean result" . from reading other answers, I understand as i'm asking if a series/frame is superior to 0 . The above code works with len(x), again makes sense in that context.

i tried with all() as well but it doesn't behave as intended. the .all() functions returns a single boolean per group and the sum is then just a simple rolling sum.

i've tried creating a list of booleans to say which values are positive and which are not but that also yields an error, this time i'm not sure why.

groupby1=new.groupby(1)[0]
groupby2=[y>0 for x in groupby1 for y in x[1] ]
groupby_try=new.groupby(1)[0].filter(lambda x:groupby2,dropna=False).rolling(2).sum()

1) how do i make the above code work and what is wrong in how i am thinking about it ?

2) is this the "best Practice" way to do these types of operations ?

any help appreciated, let me know if i've missed anything or any further clarification is needed.

Upvotes: 1

Views: 431

Answers (1)

Ben.T
Ben.T

Reputation: 29635

According to the doc on filter after a groupby, it is not supposed to filter values within a group but groups as a whole if they don't meet some criteria, such as if the sum of all the elements of the group is above 2 then the group is kept in the first example given

One way could be to replace all the negative values by 0 in new[0] first, using np.clip for example, and then groupby, rolling and sum such as

print (np.clip(new[0],0,np.inf).groupby(new[1]).rolling(2).sum())
1   
a  0    NaN
   1    1.0
   2    3.0
b  3    NaN
   4    4.0
   5    9.0
Name: 0, dtype: float64

This way prevents from modifying the data in new, if you don't mind you can change the column 0 with new[0] = np.clip(new[0],0,np.inf) and then do new.groupby(1)[0].rolling(2).sum() which give the same result.

Upvotes: 2

Related Questions