Reputation: 20125
For a given data frame df
df = pd.DataFrame({
'id': [1, 2, 2],
'name': ['Peter', 'Max', None],
'age': [50.0, np.nan, 60.0]
})
I want to groupby
and combine the data if there is just a None
or nan
in the column of the grouped row, so that the resulting df should look like
age id name
id
1 0 50.0 1 Peter
2 1 60.0 2 Max
Is there a neat solution better than this mine:
def f(df):
names = set(df['name']) - {None}
if len(names) == 1:
df['name'] = names.pop()
else:
print('Error: Names are not mergeable:', names)
ages = {age for age in df['age'] if ~np.isnan(age)}
if len(ages) == 1:
df['age'] = ages.pop()
else:
print('Error: Ages are not mergeable:', ages)
df = df.drop_duplicates()
return df
df.groupby('id').apply(f)
Upvotes: 1
Views: 1736
Reputation: 30605
This probably be the slowest solution, you can sort the nan to last and drop them inside groupby i.e
df = pd.DataFrame({
'id': [1, 2, 2,1,2],
'name': ['Peter', 'Max', None,'Daniel','Sign'],
'age': [50.0, np.nan, 60.0,40,30]
})
# age id name
#0 50.0 1 Peter
#1 NaN 2 Max
#2 60.0 2 None
#3 40.0 1 Daniel
#4 30.0 2 Sign
df.groupby('id').apply(lambda x: x.apply(sorted,key=pd.isnull).dropna()).reset_index(drop=True)
age id name
0 50.0 1 Peter
1 40.0 1 Daniel
2 60.0 2 Max
3 30.0 2 Sign
Upvotes: 1
Reputation: 323226
groupby
+ first
df.groupby('id').first()
Out[877]:
age name
id
1 50.0 Peter
2 60.0 Max
Upvotes: 1