Omido
Omido

Reputation: 71

Python Dataframe: clean data of one column by comparing values from other columns

Another challenging question for me but perhaps an easy one for you. I just don't know how to think in a pythony way;

I have a data frame:

dt={'Name':['A','A','B','B','C','D','D'],'FG':['N','Y','N','Y','N','N','N'], 'Date':[2016,2017,2017,2016,2017,2017,2017]}

df=pd.DataFrame(data=dt,columns=['Name','FG', 'Date'])

What I want to do is, for each name in "Names" (which can be repetitive) check the flag value in "FG", if that is set to "Y" and the date associated with that is greater than the one associated with "N" I will keep the entry otherwise I will drop it. I cannot really think of a way to write this code simply due to lack of experience. To visualize here is what I have

Name  FG  Date
A     N   2016
A     Y   2017
B     N   2017
B     Y   2016
C     Y   2017
D     N   2017
D     N   2017

Here is what I would like to return

Name  FG  Date
A     Y   2017
C     Y   2017

Thanks heaps guys

Upvotes: 1

Views: 211

Answers (1)

BENY
BENY

Reputation: 323356

Add one more column restore the max value within the group.

df['check']=df.groupby('Name').Date.transform('max')
df.loc[(df.check==df.Date)&(df.FG=='Y'),:]
Out[786]: 
  Name FG  Date  check
1    A  Y  2017   2017
4    C  Y  2017   2017

EDIT:

mask=df.groupby('Name').\
  apply(lambda x : (x.FG=='Y')&(x.Date>min(x.Date)) if len(x.Date)>1 else (x.FG=='Y')).values

df[mask]
Out[808]: 
  Name FG  Date
1    A  Y  2017
4    C  Y  2017

Data input

df
Out[809]: 
  Name FG  Date
0    A  N  2016
1    A  Y  2017
2    B  N  2017
3    B  Y  2016
4    C  Y  2017
5    D  N  2017
6    D  N  2017
7    E  Y  2017
8    E  N  2017

Upvotes: 1

Related Questions