Reputation: 31
I want to group the following output by material_id keeping the unique values of material_description and MPN, but list the plant_id. picture for reference
def search_output(materials):
df=pd.DataFrame(materials)
df_ref = df.loc[:, df.columns!='@search.score'].groupby('material_id').agg({lambda
x:list(x)})
return df_ref
This currently groups by material_id and list other columns.
The following code i use to keep unique values grouped by material_id, but now I am missing the plant_id list column.
df_t = df.loc[:, df.columns!='@search.score'].groupby('material_id' ['material_description','MPN'].agg(['unique'])
I'm looking for a way to combine the two. A way to group by a column, keep unique values of specific columns and list other columns at the same time.
Hope you can help - and sorry for the pictures, but can't figure out how to add output otherwise :)
Upvotes: 0
Views: 36
Reputation: 862511
You can create dictionary by lists - first for aggregation by unique
and for all another columns by list
with dict.fromkeys
, join them an pass to GroupBy.agg
:
print (df)
material_id material_description MPN A B
0 1 descr1 a b c
1 1 descr2 a d e
2 1 descr1 b b c
3 2 descr3 a b c
4 2 descr4 a b c
5 2 descr4 a b c
u_cols = ['material_description','MPN']
d = {c: 'unique' if c in u_cols else list for c in df.columns.drop('material_id')}
df_ref = df.loc[:, df.columns!='@search.score'].groupby('material_id').agg(d)
print (df_ref)
material_description MPN A B
material_id
1 [descr1, descr2] [a, b] [b, d, b] [c, e, c]
2 [descr3, descr4] [a] [b, b, b] [c, c, c]
Upvotes: 1