subset by counting the number of times 0 occurs in a column after groupby in python

Question

I have some typical stock data. I want to create a column called "Volume_Count" that will count the number of 0 volume days per quarter. My ultimate goal is to remove all stocks that have 0 volume for more than 5 days in a quarter. By creating this column, I can write a simple statement to subset Vol_Count > 5.

A typical Dataset:

Stock     Date      Qtr     Volume
 XYZ     1/1/19   2019 Q1     0
 XYZ     1/2/19   2019 Q1     598
 XYZ     1/3/19   2019 Q1     0 
 XYZ     1/4/19   2019 Q1     0
 XYZ     1/5/19   2019 Q1     0
 XYZ     1/6/19   2019 Q1     2195
 XYZ     1/7/19   2019 Q1     0
 ... ... and so on (for multiple stocks and quarters)

This is what I've tried - a 1 liner -

df = df.groupby(['stock','Qtr'], as_index=False).filter(lambda x: len(x.Volume == 0) > 5)

However, as stated previously, this produced inconsistent results.

I want to remove the stock from the dataset only for the quarter where the volume == 0 for 5 or more days.

Note: I have multiple Stocks and Qtr in my dataset, therefore it's essential to groupby Qtr, Stock.

Desired Output: I want to keep the dataset but remove any stocks for a qtr if they have a volume = 0 for > 5 days.. that might entail a stock not being in the dataset for 2019 Q1 (because Vol == 0 >5 days) but being in the df in 2019 Q2 (Vol == 0 < 5 days)...

Scott Boston · Accepted Answer

Try this:

df[df['Volume'].eq(0).groupby([df['Stock'],df['Qtr']]).transform('sum') < 5]

Details.

First take the Volume column of your dataframe and check to see if it zero for each record.
Next, group that column by 'Stock' and 'Qtr' columns and get a sum of each True values from step 1 assign that sum to each record using groupby and transform.
Create boolean series from that sum where True if less than 5 and use that series to boolean index your original dataframe.

subset by counting the number of times 0 occurs in a column after groupby in python

Answers (1)

Related Questions