Reputation: 1
I have a data frame like this:
temp_id | administ_div | municipality | combine |
---|---|---|---|
1 | Gunma | Maebashi | |
NaN | Gunma | Maebashi | CC |
NaN | Gunma | Maebashi | DD |
NaN | Gunma | Maebashi | EE |
2 | Gunma | Kiryu | AA |
How do I group by administ_div
, municipality
to this (temp_id must be not NaN):
temp_id | administ_div | municipality | combine |
---|---|---|---|
1 | Gunma | Maebashi | [CC,DD,EE] |
2 | Gunma | Kiryu | AA |
Upvotes: 0
Views: 492
Reputation: 13242
out = (df.assign(temp_id=df.temp_id.ffill())
.groupby(['temp_id', 'administ_div', 'municipality'], as_index=False)['combine']
.agg(lambda x: x.dropna().tolist()))
print(out)
Output:
temp_id administ_div municipality combine
0 1.0 Gunma Maebashi [CC, DD, EE]
1 2.0 Gunma Kiryu [AA]
Upvotes: 0
Reputation: 260455
You can use a custom groupby.agg
with help of ffill
:
# aggregate all columns with first value
agg_funcs = {c: 'first' for c in df}
# except "combine" as list of non-null values
agg_funcs['combine'] = lambda l: [x for x in l if x]
# group by filled values and aggregate
out = (df
.groupby(df['temp_id'].ffill(), as_index=False)
.agg(agg_funcs)
)
output:
temp_id administ_div municipality combine
0 1.0 Gunma Maebashi [CC, DD, EE]
1 2.0 Gunma Kiryu [AA]
Upvotes: 2
Reputation: 527
try:
df['temp_id'] = df['temp_id'].ffill()
df1 = df.groupby(['temp_id', 'administ_div', 'municipality'])['combine'].apply(lambda x: [i for i in x if not pd.isna(i)][0] if len(x)==1 else [i for i in x if not pd.isna(i)]).reset_index()
df1
temp_id administ_div municipality combine
0 1.0 Gunma Maebashi [CC, DD, EE]
1 2.0 Gunma Kiryu AA
Upvotes: 0