Reputation: 33
My goal is to drop the two stocks/tickers (or bottom decile) based on the volume (drop the entire row for 2 stocks/tickers (level 1) with the lowest volumes for each level 0 index Date)
The DataFrame has already been sorted by volume so for each date, its sorted by volume in ascending order. So the DataFrame would might look like (has been shortened to 5 stocks instead of 20) : Example of DataFrame: '''
Date Ticker col1 col2 col3 Volume
2020-01-01 stock1 - - - 5
stock2 - - - 10
stock3 - - - 20
stock4 - - - 40
stock5 - - - 43
2020-02-01 stock3 - - - 7
stock5 - - - 14
stock1 - - - 33
stock2 - - - 50
stock4 - - - 52
For level 0 index Date "2020-01-01", I would want to drop stock1 and stock2, but for the next level index Date "2020-02-01" I want to drop the new lowest 2 which are stock3 and stock5.
Note: the real DataFrame will be much bigger with more than just 5 stocks and for many more months
So far I have tried adding a decile column (since my real goal is to do this for 20 stocks), using qcut
,which will automatically give me the lowest two values by volume, but I wasn't able to replicated for EACH level 0 date (was only successful for one date and not sure how to do it for each level 0 date).
I also tried nsmallest
and nlargest
but encountered errors due to this being a DataFrame.
Do you have any suggestions as to how I can do this task? I feel as though I'm on the right path but I am missing something basic. Any insight is appreciated!
Upvotes: 0
Views: 253
Reputation: 11105
Since your DataFrame is already sorted by date and volume, you can drop the first 2 rows from each date group by adapting any of the answers to Python: Pandas - Delete the first row by group. For example:
# Create input data based on your example
d = {'Date': 5 * ['2020-01-01'] +
5 * ['2020-02-01'],
'Ticker': ['stock1', 'stock2', 'stock3', 'stock4', 'stock5',
'stock3', 'stock5', 'stock1', 'stock2', 'stock4'],
'col1': 10 * ['-'],
'Volume': [5, 10, 20, 40, 43, 7, 14, 33, 50, 52]}
df = pd.DataFrame(d)
# Get the first and second rows of each date group
to_del = df.groupby('Date', as_index=False).nth([0, 1])
# Intentionally duplicate the first and second rows of each date
# group, then remove them with drop_duplicates with keep=False to
# drop *all* duplicated rows without keeping first occurrences
res = pd.concat([df, to_del]).drop_duplicates(keep=False)
print(res)
Date Ticker col1 Volume
2 2020-01-01 stock3 - 20
3 2020-01-01 stock4 - 40
4 2020-01-01 stock5 - 43
7 2020-02-01 stock1 - 33
8 2020-02-01 stock2 - 50
9 2020-02-01 stock4 - 52
Upvotes: 1