divingTobi
divingTobi

Reputation: 2300

Transform pandas groupby result with subtotals to relative values

I have come accross a nice solution to insert subtotals into a pandas groupby dataframe. However, now I would like to modify the result to show relative values with respect to the subtotals, instead of the absolute values.

This is the code to show the groupby:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "Category": np.random.choice(["Group A", "Group B"], 50),
        "Product": np.random.choice(["Product 1", "Product 2"], 50),
        "Units_Sold": np.random.randint(1, 100, size=(50)),
        "Date": np.random.choice(
            pd.date_range("1/1/2011", "03/31/2011", freq="D"), 50, replace=False
        ),
    }
)

iList = ["Category", "Product"]

pvt = pd.concat(
    [df.assign(**{x: "" for x in iList[i:]}).groupby(iList).sum() for i in range(1, 3)]
).sort_index()

print(pvt)

which results into

                    Units_Sold
Category Product              
Group A                   1170
         Product 1         434
         Product 2         736
Group B                    980
         Product 1         437
         Product 2         543

I have tried something along the lines

pvt.transform(lambda x: (round(x / x['Group A', ''],2)*100).astype(int).astype(str)+"%")

but obviously this only calculates values relative to the first row.

What I am looking for is

                    Units_Sold
Category Product              
Group A                   100%
         Product 1         37%
         Product 2         63%
Group B                   100%
         Product 1         45%
         Product 2         55%

Thanks a lot!

Upvotes: 3

Views: 295

Answers (1)

jezrael
jezrael

Reputation: 863226

Use GroupBy.apply by first level of MultiIndex with lambda function:

f = lambda x: (x / x.iloc[0]).mul(100).round(2).astype(int).astype(str)+"%"
df = pvt.groupby(level=0).apply(f)
print (df)
                   Units_Sold
Category Product             
Group A                  100%
         Product 1        49%
         Product 2        50%
Group B                  100%
         Product 1        52%
         Product 2        47%

Or use GroupBy.transform with GroupBy.first:

df = (pvt.div(pvt.groupby(level=0).transform('first'))
         .mul(100)
         .round(2)
         .astype(int)
         .astype(str)+"%")
print (df)

                   Units_Sold
Category Product             
Group A                  100%
         Product 1        43%
         Product 2        56%
Group B                  100%
         Product 1        58%
         Product 2        41%

Upvotes: 4

Related Questions