Reputation: 428
Assume I have to two dataframes named "country_to_country" and "country_area_mapping" as below.
>>> country_to_country
From To Volume
0 c1 c4 10
1 c2 c5 20
2 c3 c6 30
>>> country_area_mapping
Country Area
0 c1 a1
1 c2 a2
2 c3 a1
3 c4 a2
4 c5 a1
5 c6 a2
Expected output would be:
From To Volume
0 a2 a1 40
1 a2 a1 20
Where country names should be replaced with the corresponding area and then group-by the table using sum on Volume column.
My approach was to use "merge" two times once for "From" column and then merge the result on "To" column. Finally, applying "groupby" to sum same From-To areas volume. However, I feel there exists a shorter way to do so. Any suggestion for an alternative?
Upvotes: 1
Views: 396
Reputation: 403258
Create a mapping:
In [62]: mapping = dict(country_area_mapping[['Country', 'Area']].values)
Explicit indexing is done to guarantee order when generating the mapping. Now use df.replace
followed by df.groupby
:
In [64]: country_to_country.replace(mapping).groupby(['From', 'To'], as_index=False).sum()
Out[64]:
From To Volume
0 a1 a2 40
1 a2 a1 20
Upvotes: 1
Reputation: 863761
You can use DataFrame.replace
by Series
and then groupby
with aggregating sum
:
s = country_area_mapping.set_index('Country')['Area']
df = country_to_country.replace({'From':s, 'To':s})
.groupby(['From','To'], as_index=False)['Volume'].sum()
print (df)
From To Volume
0 a1 a2 40
1 a2 a1 20
Upvotes: 2