Reputation: 1257
(This post does not explain what was requested below. There is the issue of merge before concatenation or sum, using these functions together to fill a column is the question.)
I have two dataframes that look like these (obviously these are examples):
data1 = {'Name': ['Alex','Alex','Cristiano','Cristiano','Fernando','Jonas','William'], 'Color': ['Blue','Red','Black','Blue','Yellow','Pink','Green'],
'Codes': ['','','','','','',''], 'Values': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}
data2 = {'Name': ['Jonas','Alex','Cristiano','Cristiano','Alex'], 'Color': ['Pink','Red','Black','Blue','Red'],
'Codes': ['1456','1450','1453','1530','1459'], 'Values': [12000.00,5000.50,78000.00,2000.00,1500.00]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
Are so:
df1:
Name Color Codes Values
0 Alex Blue NaN
1 Alex Red NaN
2 Cristiano Black NaN
3 Cristiano Blue NaN
4 Fernando Yellow NaN
5 Jonas Pink NaN
6 William Green NaN
df2:
Name Color Codes Values
0 Jonas Pink 1456 12000.0
1 Alex Red 1450 5000.5
2 Cristiano Black 1453 78000.0
3 Cristiano Blue 1530 2000.0
4 Alex Red 1459 1500.0
I want to fill the 'Codes' and 'Values' columns of df1 with the corresponding values in df2 ('Code' and 'Values'), doing the matching of the columns 'Name' and 'Color'.
The problem is that there is more than one correspondence, for the column 'Codes' I want the concatenation of strings in 'Code' (separating with a comma), for the column 'Values' I want the sum of the correspondences.
I tried using merge, but this second part I don't know how to implement. To be clearer, the desired output of df1 is this:
Name Color Codes Values
0 Alex Blue NaN
1 Alex Red 1450 , 1459 6500.5
2 Cristiano Black 1453 78000.0
3 Cristiano Blue 1530 2000.0
4 Fernando Yellow NaN
5 Jonas Pink 1456 12000.0
6 William Green NaN
Upvotes: 1
Views: 71
Reputation: 13821
You can use pd.merge()
using df1
as it is and a grouped version of df2
(grouped by Name and Color). The aggregation should be list
on Codes and sum
on Values:
pd.merge(df1,
df2.groupby(['Name','Color']).agg({'Codes':list,'Values':'sum'}).reset_index(),
how='left',
on=['Name','Color'],
suffixes=('_x', ''))\
.drop(['Codes_x','Values_x'],axis=1)
Which prints back:
Name Color Codes Values
0 Alex Blue NaN NaN
1 Alex Red [1450, 1459] 6500.5
2 Cristiano Black [1453] 78000.0
3 Cristiano Blue [1530] 2000.0
4 Fernando Yellow NaN NaN
5 Jonas Pink [1456] 12000.0
6 William Green NaN NaN
Upvotes: 1