nutix
nutix

Reputation: 75

How to create new rows in multiindex DataFrames using existing data?

I have a multi-indexed dataframe like so

                          Value
Source       Partner              
USA          DEU          20
             CHN          10
             MEX          5
DEU          USA          12
             CHN          6
             MEX          2
CHN          USA          1
             DEU          2
             MEX          3

I want to add rows to represent an aggregate of some core countries, say USA, DEU and CHN, by 'Source' as well as 'Partner'. The intended output of new rows is the following:

                          Value
Source       Partner              
CORE         USA          13
             DEU          22
             CHN          16
USA          CORE         30
CHN          CORE         3
DEU          CORE         18

Source and Partner are the two multi-indices.

Any tidy and quick way to generate the second output dataframe? In the actual application I have many more countries of course.

Upvotes: 1

Views: 62

Answers (1)

Laurent
Laurent

Reputation: 13468

With your initial dataframe:

import pandas as pd

data = {
    ("USA", "DEU"): 20,
    ("USA", "CHN"): 10,
    ("USA", "MEX"): 5,
    ("DEU", "USA"): 12,
    ("DEU", "CHN"): 6,
    ("DEU", "MEX"): 2,
    ("CHN", "USA"): 1,
    ("CHN", "DEU"): 2,
    ("CHN", "MEX"): 3,
}

df = pd.DataFrame(list(data.values()), index=data.keys(), columns=["Value"])
df.index.names = ["Source", "Partner"]

Here is one way to do it:

CORE = ["USA", "DEU", "CHN"]

# Build first part of the expected dataframe
df1 = df.reset_index()
df1["Source"] = "CORE"
df1 = (
    df1.loc[df1["Partner"].isin(CORE), :]
    .groupby(["Source", "Partner"])
    .agg(sum)
    .sort_values("Partner", ascending=False)
)

print(df1)
# Output

enter image description here

# Build second part
df2 = df.reset_index()
df2 = (
    df2.loc[df2["Partner"].isin(CORE), :]
    .assign(Partner="CORE")
    .groupby(["Source", "Partner"])
    .agg(sum)
    .reindex(
        index=pd.MultiIndex.from_tuples(
            [("USA", "CORE"), ("CHN", "CORE"), ("DEU", "CORE")],
            names=["Source", "Partner"],
        )
    )
)

print(df2)
# Output

enter image description here

And then:

# Build final dataframe
new_df = pd.concat([df1, df2])
print(new_df)
# Output

enter image description here

Upvotes: 1

Related Questions