Reputation: 99
I have three datasets,
df1
name A B C D
0 Jeffrey Ray 0.171130 0.307791 -0.031378 0.366607
1 Crystal Roberts 0.429614 1.674201 0.176098 -1.289623
2 Paul Choute 0.396332 0.879003 1.889926 0.745100
3 Amy Clements 1.452253 0.724694 -0.830645 -0.342138
4 Stanley Coston -0.135439 -0.293293 0.684627 0.538049
df2
name A B C D
0 Jeffrey Ray 0.173430 0.399791 -0.012278 0.88897
1 Sara Flicker 0.635744 1.699001 0.176098 -1.11113
2 Jack Sether 0.323333 0.879003 1.889926 0.983640
3 Amy Clements 1.452253 0.784734 -0.673695 -0.342138
4 Stanley Coston -0.143222 -0.293293 0.683647 0.638479
df3
name A B C D
0 Jeffrey Ray 0.638273 0.687777 0.766666 0.900032
1 Sara Flicker 0.635744 1.699001 0.176098 -1.11113
2 Samantha Runyon 0.323333 -0.879003 1.889926 -0.683640
3 Amy Clements -1.452253 0.784734 -0.673695 -0.342138
4 Stanley Coston -0.143222 -0.282222 0.683647 -0.638479
I want to map name in each name and get the sum D:
Jeffrey Ray 0.366607+0.88897+0.900032
Amy Clements -0.342138+(-0.342138)+(-0.342138)
Stanley Coston 0.538049+0.638479-0.638479
I tried pd.merge:
pd.merge(df1, df2, on='name', how='inner').sum().reset_index()
But the output is not what I expected. Any suggestion on how should I change it? Another issue is that it can only merge two datasets each time, is there an efficient way that I can map multiple datasets at the same time without a for loop?
Thanks
Upvotes: 1
Views: 442
Reputation: 765
df1=df1.set_index('name')
df2=df2.set_index('name')
df3=df3.set_index('name')
idx1=df1.index.intersection(df2.index)\
.intersection(df3.index)
(df1.loc[idx1]+df2.loc[idx1]+df3.loc[idx1]).reset_index()
name A B C D
0 Jeffrey-Ray 0.982833 1.395359 0.723010 2.155609
1 Amy-Clements 1.452253 2.294162 -2.178035 -1.026414
2 Stanley-Coston -0.421883 -0.868808 2.051921 0.538049
Upvotes: 1
Reputation: 120509
Use pd.merge
and reduce
from functools
:
from functools import reduce
out = reduce(lambda dfL, dfR: dfL.set_index('name').add(dfR.set_index('name')).reset_index(), [df1, df2, df3]) \
.dropna().set_index('name')['D']
print(out)
# Output:
name
Amy Clements -1.026414
Jeffrey Ray 2.155609
Stanley Coston 0.538049
dtype: float64
Upvotes: 2
Reputation:
How about:
dfs = pd.concat([df1,df2,df3])
out = dfs.groupby('name')['D'].sum()[dfs['name'].value_counts()==3]
Output:
name
Amy Clements -1.026414
Jeffrey Ray 2.155609
Stanley Coston 0.538049
Name: D, dtype: float64
Upvotes: 3
Reputation: 262164
You can concat
only the D
column, using the names as index
. Use dropna
to get rid of the incomplete datasets, and sum
:
dfs = [df1, df2, df3]
(pd.concat([d.set_index('name')['D'] for d in dfs], axis=1)
.dropna()
.sum(1)
)
output:
name
Jeffrey Ray 2.155609
Amy Clements -1.026414
Stanley Coston 0.538049
Upvotes: 3