How to sort output within a group by date?

Question

I have a dataframe which is the result of screening through all the S&P500 stocks and is classifying some SOS signals found during the screening and the date where each signal is appearing.

For this dataframe I'm applying the following filter, because I only want to be focused on stocks which have these signals: Test in Rising, Absorption Vol, Stopping Vol for each ticker

s = {'Stopping Vol', 'Absorption Vol', 'Test in Rising'}
df.groupby('Ticker').filter(lambda sf: s.issubset(sf['SOS']))

And this is the output:

     Ticker             SOS        Date  frequency
756    BXSL  Test in Rising  2021-12-21         11
757    BXSL    Stopping Vol  2021-12-27         11
758    BXSL    Stopping Vol  2021-12-28         11
759    BXSL  Selling Climax  2021-12-28         11
760    BXSL        Shakeout  2021-12-28         11
761    BXSL  Absorption Vol  2021-12-30         11
762    BXSL     Bag Holding  2021-12-30         11
763    BXSL  Test in Rising  2021-12-30         11
764    BXSL    Stopping Vol  2022-01-03         11
765    BXSL  Absorption Vol  2022-01-06         11
766    BXSL  Test in Rising  2022-01-06         11
5057   WDAY  Test in Rising  2021-12-16          9
5058   WDAY            Test  2021-12-20          9
5059   WDAY  Test in Rising  2021-12-20          9
5060   WDAY  Test in Rising  2021-12-22          9
5061   WDAY  Test in Rising  2021-12-29          9
5062   WDAY    Stopping Vol  2022-01-04          9
5063   WDAY  Selling Climax  2022-01-04          9
5064   WDAY        Shakeout  2022-01-04          9
5065   WDAY  Absorption Vol  2022-01-05          9
1167   CROX    Stopping Vol  2021-12-17          9

The problem I have now is that I only want as a result those tickers where these signals are met, but I want also to keep the order of the signals by date of appearance. I want to show Ticker only when I see Stopping Vol signal first, then a Absorption Vol signal and finally a Test in Rising signal. It doesn't matter if there are any other signals in between, I just want to keep that order in between them.

So the output would be something similar to:

     Ticker             SOS        Date  frequency
756    JPGY  Selling Climax  2021-12-21         11
757    JPGY    Stopping Vol  2021-12-27         11
758    JPGY    Other signal  2021-12-28         11
761    JPGY  Absorption Vol  2021-12-30         11
763    JPGY  Test in Rising  2021-12-31         11

How can I get this?

I'm having a look to this other question and don't know how to progress. pandas groupby sort within groups

jezrael · Accepted Answer

Use:

#in set is not defined order, so use list ot tuple
L = ['Stopping Vol', 'Absorption Vol', 'Test in Rising']

#filter rows if match all 3 values
df1 = df.groupby('Ticker').filter(lambda sf: set(L).issubset(sf['SOS']))

#get only rows with L
df1 = df1[df1['SOS'].isin(L)]

#get first and last values per filtered df1
df1 = df1.groupby(['Ticker'])['SOS'].agg(['first','last'])

#and test if first column is first value in L and last column last value in L
#middle values is already tested, so got only groups in this order
tickers = df1.index[df1['first'].eq(L[0]) & df1['last'].eq(L[2])]
df2 = df[df['Ticker'].isin(tickers)]
print (df2)

How to sort output within a group by date?

Answers (1)

Related Questions