Reputation: 71
Another challenging question for me but perhaps an easy one for you. I just don't know how to think in a pythony way;
I have a data frame:
dt={'Name':['A','A','B','B','C','D','D'],'FG':['N','Y','N','Y','N','N','N'], 'Date':[2016,2017,2017,2016,2017,2017,2017]}
df=pd.DataFrame(data=dt,columns=['Name','FG', 'Date'])
What I want to do is, for each name in "Names" (which can be repetitive) check the flag value in "FG", if that is set to "Y" and the date associated with that is greater than the one associated with "N" I will keep the entry otherwise I will drop it. I cannot really think of a way to write this code simply due to lack of experience. To visualize here is what I have
Name FG Date
A N 2016
A Y 2017
B N 2017
B Y 2016
C Y 2017
D N 2017
D N 2017
Here is what I would like to return
Name FG Date
A Y 2017
C Y 2017
Thanks heaps guys
Upvotes: 1
Views: 211
Reputation: 323356
Add one more column restore the max
value within the group.
df['check']=df.groupby('Name').Date.transform('max')
df.loc[(df.check==df.Date)&(df.FG=='Y'),:]
Out[786]:
Name FG Date check
1 A Y 2017 2017
4 C Y 2017 2017
EDIT:
mask=df.groupby('Name').\
apply(lambda x : (x.FG=='Y')&(x.Date>min(x.Date)) if len(x.Date)>1 else (x.FG=='Y')).values
df[mask]
Out[808]:
Name FG Date
1 A Y 2017
4 C Y 2017
Data input
df
Out[809]:
Name FG Date
0 A N 2016
1 A Y 2017
2 B N 2017
3 B Y 2016
4 C Y 2017
5 D N 2017
6 D N 2017
7 E Y 2017
8 E N 2017
Upvotes: 1