Reputation: 389
I have some typical stock data. I want to create a column called "Volume_Count" that will count the number of 0 volume days per quarter. My ultimate goal is to remove all stocks that have 0 volume for more than 5 days in a quarter. By creating this column, I can write a simple statement to subset Vol_Count > 5.
A typical Dataset:
Stock Date Qtr Volume
XYZ 1/1/19 2019 Q1 0
XYZ 1/2/19 2019 Q1 598
XYZ 1/3/19 2019 Q1 0
XYZ 1/4/19 2019 Q1 0
XYZ 1/5/19 2019 Q1 0
XYZ 1/6/19 2019 Q1 2195
XYZ 1/7/19 2019 Q1 0
... ... and so on (for multiple stocks and quarters)
This is what I've tried - a 1 liner -
df = df.groupby(['stock','Qtr'], as_index=False).filter(lambda x: len(x.Volume == 0) > 5)
However, as stated previously, this produced inconsistent results.
I want to remove the stock from the dataset only for the quarter where the volume == 0 for 5 or more days.
Note: I have multiple Stocks and Qtr in my dataset, therefore it's essential to groupby Qtr, Stock.
Desired Output: I want to keep the dataset but remove any stocks for a qtr if they have a volume = 0 for > 5 days.. that might entail a stock not being in the dataset for 2019 Q1 (because Vol == 0 >5 days) but being in the df in 2019 Q2 (Vol == 0 < 5 days)...
Upvotes: 1
Views: 85
Reputation: 153500
Try this:
df[df['Volume'].eq(0).groupby([df['Stock'],df['Qtr']]).transform('sum') < 5]
Details.
groupby
and transform
.Upvotes: 3