Reputation: 75
I have a multi-indexed dataframe like so
Value
Source Partner
USA DEU 20
CHN 10
MEX 5
DEU USA 12
CHN 6
MEX 2
CHN USA 1
DEU 2
MEX 3
I want to add rows to represent an aggregate of some core countries, say USA, DEU and CHN, by 'Source' as well as 'Partner'. The intended output of new rows is the following:
Value
Source Partner
CORE USA 13
DEU 22
CHN 16
USA CORE 30
CHN CORE 3
DEU CORE 18
Source and Partner are the two multi-indices.
Any tidy and quick way to generate the second output dataframe? In the actual application I have many more countries of course.
Upvotes: 1
Views: 62
Reputation: 13468
With your initial dataframe:
import pandas as pd
data = {
("USA", "DEU"): 20,
("USA", "CHN"): 10,
("USA", "MEX"): 5,
("DEU", "USA"): 12,
("DEU", "CHN"): 6,
("DEU", "MEX"): 2,
("CHN", "USA"): 1,
("CHN", "DEU"): 2,
("CHN", "MEX"): 3,
}
df = pd.DataFrame(list(data.values()), index=data.keys(), columns=["Value"])
df.index.names = ["Source", "Partner"]
Here is one way to do it:
CORE = ["USA", "DEU", "CHN"]
# Build first part of the expected dataframe
df1 = df.reset_index()
df1["Source"] = "CORE"
df1 = (
df1.loc[df1["Partner"].isin(CORE), :]
.groupby(["Source", "Partner"])
.agg(sum)
.sort_values("Partner", ascending=False)
)
print(df1)
# Output
# Build second part
df2 = df.reset_index()
df2 = (
df2.loc[df2["Partner"].isin(CORE), :]
.assign(Partner="CORE")
.groupby(["Source", "Partner"])
.agg(sum)
.reindex(
index=pd.MultiIndex.from_tuples(
[("USA", "CORE"), ("CHN", "CORE"), ("DEU", "CORE")],
names=["Source", "Partner"],
)
)
)
print(df2)
# Output
And then:
# Build final dataframe
new_df = pd.concat([df1, df2])
print(new_df)
# Output
Upvotes: 1