Reputation: 1085
I have a dataframe like this:
StringCol Timestamp GroupID Flag
xyz 20170101 123 yes
abc 20170101 123 yes
def 20170101 123 yes
ghi 20170101 123 no
abc 20170101 124 yes
jkl 20170101 124 yes
pqr 20170101 124 no
klm 20170101 124 yes
I want to group this by the GroupID, and for each group, I want the rows that have flag as "no" and X number of previous rows before it (the dataframe is sorted by GroupID and Timestamp already).
So, if X = 2, I want the result to be something like:
StringCol Timestamp GroupID Flag
abc 20170101 123 yes
def 20170101 123 yes
ghi 20170101 123 no
abc 20170101 124 yes
jkl 20170101 124 yes
pqr 20170101 124 no
How do I achieve this? Thanks.
Upvotes: 1
Views: 111
Reputation: 323306
If you only need last no in the group try drop_duplicates
df1=df.copy()
df=df[df['Flag'].eq('no')].drop_duplicates(['GroupID'],keep='last')
idx=df.index+1
idy=df.index-2
import itertools
df1.loc[list(itertools.chain(*[list(range(y,x)) for x , y in zip(idx,idy)]))]
Out[512]:
StringCol Timestamp GroupID Flag
1 abc 20170101 123 yes
2 def 20170101 123 yes
3 ghi 20170101 123 no
4 abc 20170101 124 yes
5 jkl 20170101 124 yes
6 pqr 20170101 124 no
Upvotes: 1
Reputation: 402613
This gets the previous X items for the last flag per group.
def prevK(x):
i = x.reset_index(drop=True).Flag.eq('no').iloc[::-1].idxmax()
return x.iloc[i - 2:i + 1, :]
df.groupby('GroupID', group_keys=False).apply(prevK)
StringCol Timestamp GroupID Flag
1 abc 20170101 123 yes
2 def 20170101 123 yes
3 ghi 20170101 123 no
4 abc 20170101 124 yes
5 jkl 20170101 124 yes
6 pqr 20170101 124 no
Upvotes: 2