Michael Dorner
Michael Dorner

Reputation: 20175

Merge duplicated pandas rows on specific rules

Given the following data frame

df = pd.DataFrame({
    'identifier': ['1', '2', None], 
    'name': ['Tom', 'Peter', 'Peter'], 
    'registered': [True, False, True]
})

the ultimate goal is to merge the data frame grouped by the name and according to certain rules, e.g.

So the result should look like

df_result = pd.DataFrame({
    'identifier': ['1', '2'], 
    'name': ['Tom', 'Peter'], 
    'registered': [True, True]
})

I tried it with groupby, but maybe this is the wrong way at all?

drop_duplicates do not let me to add specific rules.

Upvotes: 3

Views: 63

Answers (2)

jezrael
jezrael

Reputation: 863291

I think you need custom function with dropna, drop_duplicates and any:

df = pd.DataFrame({
    'identifier': ['1', '2', None, '2'], 
    'name': ['Peter', 'Peter', 'Peter', 'Peter'], 
    'registered': [True, False, True, True]
})
print (df)
  identifier   name  registered
0          1  Peter        True
1          2  Peter       False
2       None  Peter        True
3          2  Peter        True

def f(x):
    return pd.DataFrame({'identifier': x['identifier'].dropna().drop_duplicates(), 
                         'registered': x['registered'].any()})

df = df.groupby('name').apply(f).reset_index(level=1, drop=True).reset_index()
print (df)
    name identifier  registered
0  Peter          1        True
1  Peter          2        True

Upvotes: 1

cs95
cs95

Reputation: 402872

Let's modify your data slightly.

df = pd.DataFrame({
    'identifier': ['1', None, '2'], 
    'name': ['Tom', 'Peter', 'Peter'], 
    'registered': [True, False, True]
})

df

  identifier   name  registered
0          1    Tom        True
1       None  Peter       False
2          2  Peter        True

A "None" is the first identifier for "Peter". You can remedy this with a sort_values call, following which, you call groupby + agg.

df.sort_values(['identifier'])\
  .groupby('name', as_index=False)\
  .agg({'identifier' : 'first', 'registered' : any})

    name  registered identifier
0  Peter        True          2
1    Tom        True          1

Upvotes: 1

Related Questions