Reputation: 91
I have a df like this:
ix y1 y2 id
ix1 X X AP10579
ix2 E E AP17998
ix3 C C AP283716
ix4 C C AP283716
ix5 E E AP17998
ix6 T T AP21187
ix7 X Z AP10579
ix8 T K AP21187
ix9 E E AP12457
ix10 C C Ap87930
in id column, we have two ids which are similar (f.x. ix1 & ix7 have the same id, ix2 & ix5, and so on) . also we have some unique ids,
I want to check if y1+y2 of each of these two ids are the same or not,
and if they are the same so move one of them in a new df,
also move every unique id,
so I should have a new df, df_new, like this:
ix y1 y2 id
ix2 E E AP17998
ix3 C C AP283716
ix9 E E AP12457
ix10 C C Ap87930
any suggestions is appreciated.
df = {
'ix': ['ix1','ix2','ix3','ix4','ix5','ix6','ix7','ix8','ix9','ix10'],
'y1': ['X','E','C','C','E','T','X','T', 'E','C'],
'y2': ['X','E','C','C','E','T','Z','K', 'E','C'],
'id': ['AP10579','AP17998','AP283716','AP283716','AP17998','AP21187','AP10579','AP21187', 'AP12457', 'Ap87930']
}
Upvotes: 1
Views: 42
Reputation: 3077
This is a possible approach:
df = pd.DataFrame({
'ix': ['ix1','ix2','ix3','ix4','ix5','ix6','ix7','ix8','ix9','ix10'],
'y1': ['X','E','C','C','E','T','X','T', 'E','C'],
'y2': ['X','E','C','C','E','T','Z','K', 'E','C'],
'id': ['AP10579','AP17998','AP283716','AP283716','AP17998','AP21187','AP10579','AP21187', 'AP12457', 'Ap87930']
})
def filter_df(g):
if len(g) == 1:
return g.iloc[0]
if g.y1.unique().size + g.y2.unique().size == 2:
return g.iloc[0]
df.groupby('id').agg(filter_df).dropna().reset_index()
output:
id ix y1 y2
0 AP12457 ix9 E E
1 AP17998 ix2 E E
2 AP283716 ix3 C C
3 Ap87930 ix10 C C
Upvotes: 1