Merge 2 CSV files with mapped values in another file separated by comma

Question

here is my problem:

I have tow csv files as follows:

Book1.csv

Id  Product
0   aaaa
1   bbbb
2   cccc
3   dddd

Book2.csv

Id  Attribute
0   aaad
0   sssd
1   fffd
1   gggd
1   cccd
2   bbbd
3   hhhd
3   bbbd

I want merge above files and get an output file as this:

Product Attributes
aaaa    aaad, sssd
bbbb    fffd, gggd, cccd
cccc    bbbd
dddd    hhhd, bbbd

The code I am using right now is:

import pandas as pd

a = pd.read_csv("Book1.csv")
b = pd.read_csv("Book2.csv")
b = b.dropna(axis=0)
merged = a.merge(b, how='left', left_on='Id', right_on='Id' )
merged.rename(columns={
                 'Product': 'Product',
                 'Attribute': 'Attributes'}, inplace=True)
merged = merged[['Product','Attributes']]
merged.to_csv("output.csv", index=False)

what I get from this is :

Product Attributes
aaaa    aaad
aaaa    sssd
bbbb    fffd
bbbb    gggd
bbbb    cccd
cccc    bbbd
dddd    hhhd
dddd    bbbd

All the Attributes and Products are merged correctly. But what I want is merge Attibutes into one string and separate by comma (not line by line). How do I do this? Thank you in advance!

jpp · Accepted Answer

This is one way.

g = df2.groupby('Id')['Attribute'].apply(', '.join)
df1['Attributes'] = df1['Id'].map(g)

Result

   Id Product          Attributes
0   0    aaaa          aaad, sssd
1   1    bbbb    fffd, gggd, cccd
2   2    cccc                bbbd
3   3    dddd          hhhd, bbbd

If you want to just combine to list, you can use this instead, though it won't print nicely:

g = df2.groupby('Id')['Attribute'].apply(list)

Explanation

Group df2 Attributes by Id and aggregate to list.
Map to column in df1 via pd.Series.map.

Merge 2 CSV files with mapped values in another file separated by comma

Answers (1)

Related Questions