ARASH
ARASH

Reputation: 428

Rename the column's entries and then groupby in pandas

Assume I have to two dataframes named "country_to_country" and "country_area_mapping" as below.

>>> country_to_country
  From  To  Volume
0   c1  c4      10
1   c2  c5      20
2   c3  c6      30

>>> country_area_mapping
  Country  Area
0   c1     a1
1   c2     a2  
2   c3     a1  
3   c4     a2  
4   c5     a1  
5   c6     a2

Expected output would be:

  From  To  Volume
0   a2  a1      40
1   a2  a1      20

Where country names should be replaced with the corresponding area and then group-by the table using sum on Volume column.

My approach was to use "merge" two times once for "From" column and then merge the result on "To" column. Finally, applying "groupby" to sum same From-To areas volume. However, I feel there exists a shorter way to do so. Any suggestion for an alternative?

Upvotes: 1

Views: 396

Answers (2)

cs95
cs95

Reputation: 403258

Create a mapping:

In [62]: mapping = dict(country_area_mapping[['Country', 'Area']].values)

Explicit indexing is done to guarantee order when generating the mapping. Now use df.replace followed by df.groupby:

In [64]: country_to_country.replace(mapping).groupby(['From', 'To'], as_index=False).sum()
Out[64]: 
  From  To  Volume
0   a1  a2      40
1   a2  a1      20

Upvotes: 1

jezrael
jezrael

Reputation: 863761

You can use DataFrame.replace by Series and then groupby with aggregating sum:

s = country_area_mapping.set_index('Country')['Area']
df = country_to_country.replace({'From':s, 'To':s})
                       .groupby(['From','To'], as_index=False)['Volume'].sum()
print (df)
  From  To  Volume
0   a1  a2      40
1   a2  a1      20

Upvotes: 2

Related Questions