Michael Dorner
Michael Dorner

Reputation: 20125

Merge two rows in pandas if None or nan

For a given data frame df

df = pd.DataFrame({
    'id': [1, 2, 2], 
    'name': ['Peter', 'Max', None], 
    'age': [50.0, np.nan, 60.0]
})

I want to groupby and combine the data if there is just a None or nan in the column of the grouped row, so that the resulting df should look like

        age     id  name
id              
1   0   50.0    1   Peter
2   1   60.0    2   Max

Is there a neat solution better than this mine:

def f(df):
    names = set(df['name']) - {None}
    if len(names) == 1:
        df['name'] = names.pop()
    else:
        print('Error: Names are not mergeable:', names)

    ages = {age for age in df['age'] if ~np.isnan(age)}
    if len(ages) == 1:
        df['age'] = ages.pop()
    else:
        print('Error: Ages are not mergeable:', ages)

    df = df.drop_duplicates()
    return df

df.groupby('id').apply(f)

Upvotes: 1

Views: 1736

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

This probably be the slowest solution, you can sort the nan to last and drop them inside groupby i.e

df = pd.DataFrame({
    'id': [1, 2, 2,1,2], 
    'name': ['Peter', 'Max', None,'Daniel','Sign'], 
    'age': [50.0, np.nan, 60.0,40,30]
})
#    age  id    name
#0  50.0   1   Peter
#1   NaN   2     Max  
#2  60.0   2    None
#3  40.0   1  Daniel
#4  30.0   2    Sign

df.groupby('id').apply(lambda x: x.apply(sorted,key=pd.isnull).dropna()).reset_index(drop=True)

    age  id    name
0  50.0   1   Peter
1  40.0   1  Daniel
2  60.0   2     Max
3  30.0   2    Sign

Upvotes: 1

BENY
BENY

Reputation: 323226

groupby + first

df.groupby('id').first()
Out[877]: 
     age   name
id             
1   50.0  Peter
2   60.0    Max

Upvotes: 1

Related Questions