Reputation: 97
here is my problem:
I have tow csv files as follows:
Book1.csv
Id Product
0 aaaa
1 bbbb
2 cccc
3 dddd
Book2.csv
Id Attribute
0 aaad
0 sssd
1 fffd
1 gggd
1 cccd
2 bbbd
3 hhhd
3 bbbd
I want merge above files and get an output file as this:
Product Attributes
aaaa aaad, sssd
bbbb fffd, gggd, cccd
cccc bbbd
dddd hhhd, bbbd
The code I am using right now is:
import pandas as pd
a = pd.read_csv("Book1.csv")
b = pd.read_csv("Book2.csv")
b = b.dropna(axis=0)
merged = a.merge(b, how='left', left_on='Id', right_on='Id' )
merged.rename(columns={
'Product': 'Product',
'Attribute': 'Attributes'}, inplace=True)
merged = merged[['Product','Attributes']]
merged.to_csv("output.csv", index=False)
what I get from this is :
Product Attributes
aaaa aaad
aaaa sssd
bbbb fffd
bbbb gggd
bbbb cccd
cccc bbbd
dddd hhhd
dddd bbbd
All the Attributes and Products are merged correctly. But what I want is merge Attibutes into one string and separate by comma (not line by line). How do I do this? Thank you in advance!
Upvotes: 1
Views: 83
Reputation: 164683
This is one way.
g = df2.groupby('Id')['Attribute'].apply(', '.join)
df1['Attributes'] = df1['Id'].map(g)
Result
Id Product Attributes
0 0 aaaa aaad, sssd
1 1 bbbb fffd, gggd, cccd
2 2 cccc bbbd
3 3 dddd hhhd, bbbd
If you want to just combine to list
, you can use this instead, though it won't print nicely:
g = df2.groupby('Id')['Attribute'].apply(list)
Explanation
df2
Attributes by Id and aggregate to list.df1
via pd.Series.map
.Upvotes: 2