John_gis
John_gis

Reputation: 127

Create a concatenation of values in pandas based on values of another column

I have 2 dataframes

df1

MATERIAL    CODE
paper       a
plastic     b
metal       c
wood        d
glass       e

df2

NAME    CODE
andy    a,b,d
roger   
danny   c,d
cole    e

I want to update df2 with the materials column and concatenate where necessary

Intended result:

NAME    CODE    MATERIAL
andy    a,b,d   paper, plastic, wood
roger       
danny   c,d     metal, wood
cole    e       glass

Upvotes: 1

Views: 19

Answers (1)

David Erickson
David Erickson

Reputation: 16683

  1. Use str.split to separate the stings into a list in preparation for explode()
  2. Then, explode() and merge()
  3. Finally, groupby and agg the columns back to strings with join

df2['CODE'] = df2['CODE'].str.split(',')
df3 = (df2.explode('CODE').merge(df1, how='left',on='CODE').fillna('')
       .groupby('NAME', sort=False, as_index=False)
       .agg({'CODE' : lambda x: ','.join(x), 'MATERIAL' : lambda x: ','.join(x)}))
df3
Out[1]: 
    NAME   CODE            MATERIAL
0   andy  a,b,d  paper,plastic,wood
1  roger                           
2  danny    c,d          metal,wood
3   cole      e               glass

Upvotes: 1

Related Questions