Reputation: 45
I've been trying without success to find a way to create an "average_gain_up" in python and have gotten a bit stuck. Being new to groupby there is something of how it is treating functions that i've not managed to grasp so any intuition behind how to think through these types of problems would be helpful.
Problem: Create a rolling 14 day sum, only summing if the value is >0 .
new=pd.DataFrame([[1,-2,3,-2,4,5],['a','a','a','b','b','b']])
new= new.T #transposing into a friendly groupby format
#Group by a or b, filter to only have positive values and then sum rolling, we
keep NAs to ensure the sum is ran over 14 values.
groupby=new.groupby(1)[0].filter(lambda x: x>0,dropna=False).rolling(14).sum()
Intended Sum Frame:
x.all()/len(x) result:
this throws a type error "the filter must return a boolean result" . from reading other answers, I understand as i'm asking if a series/frame is superior to 0 . The above code works with len(x), again makes sense in that context.
i tried with all() as well but it doesn't behave as intended. the .all() functions returns a single boolean per group and the sum is then just a simple rolling sum.
i've tried creating a list of booleans to say which values are positive and which are not but that also yields an error, this time i'm not sure why.
groupby1=new.groupby(1)[0]
groupby2=[y>0 for x in groupby1 for y in x[1] ]
groupby_try=new.groupby(1)[0].filter(lambda x:groupby2,dropna=False).rolling(2).sum()
1) how do i make the above code work and what is wrong in how i am thinking about it ?
2) is this the "best Practice" way to do these types of operations ?
any help appreciated, let me know if i've missed anything or any further clarification is needed.
Upvotes: 1
Views: 431
Reputation: 29635
According to the doc on filter
after a groupby
, it is not supposed to filter
values within a group but groups as a whole if they don't meet some criteria, such as if the sum
of all the elements of the group is above 2 then the group is kept in the first example given
One way could be to replace all the negative values by 0 in new[0]
first, using np.clip
for example, and then groupby
, rolling
and sum
such as
print (np.clip(new[0],0,np.inf).groupby(new[1]).rolling(2).sum())
1
a 0 NaN
1 1.0
2 3.0
b 3 NaN
4 4.0
5 9.0
Name: 0, dtype: float64
This way prevents from modifying the data in new
, if you don't mind you can change the column 0 with new[0] = np.clip(new[0],0,np.inf)
and then do new.groupby(1)[0].rolling(2).sum()
which give the same result.
Upvotes: 2