Jimmy
Jimmy

Reputation: 1

Pandas group by and keep ID

I have a data frame like this:

temp_id administ_div municipality combine
1 Gunma Maebashi
NaN Gunma Maebashi CC
NaN Gunma Maebashi DD
NaN Gunma Maebashi EE
2 Gunma Kiryu AA

How do I group by administ_div, municipality to this (temp_id must be not NaN):

temp_id administ_div municipality combine
1 Gunma Maebashi [CC,DD,EE]
2 Gunma Kiryu AA

Upvotes: 0

Views: 492

Answers (3)

BeRT2me
BeRT2me

Reputation: 13242

out = (df.assign(temp_id=df.temp_id.ffill())
         .groupby(['temp_id', 'administ_div', 'municipality'], as_index=False)['combine']
         .agg(lambda x: x.dropna().tolist()))
print(out)

Output:

   temp_id administ_div municipality       combine
0      1.0        Gunma     Maebashi  [CC, DD, EE]
1      2.0        Gunma        Kiryu          [AA]

Upvotes: 0

mozway
mozway

Reputation: 260455

You can use a custom groupby.agg with help of ffill:

# aggregate all columns with first value
agg_funcs = {c: 'first' for c in df}
# except "combine" as list of non-null values
agg_funcs['combine'] = lambda l: [x for x in l if x]

# group by filled values and aggregate
out = (df
 .groupby(df['temp_id'].ffill(), as_index=False)
 .agg(agg_funcs)
)

output:

   temp_id administ_div municipality       combine
0      1.0        Gunma     Maebashi  [CC, DD, EE]
1      2.0        Gunma        Kiryu          [AA]

Upvotes: 2

Khaled Koubaa
Khaled Koubaa

Reputation: 527

try:

df['temp_id'] = df['temp_id'].ffill()
df1 = df.groupby(['temp_id', 'administ_div', 'municipality'])['combine'].apply(lambda x: [i for i in x if not pd.isna(i)][0] if len(x)==1 else [i for i in x if not pd.isna(i)]).reset_index()

df1
    temp_id administ_div    municipality    combine
0   1.0     Gunma           Maebashi        [CC, DD, EE]
1   2.0     Gunma           Kiryu           AA

Upvotes: 0

Related Questions