Atharva Katre
Atharva Katre

Reputation: 587

Pandas add a total column after every year column in a multilevel dataframe

I have created the below pivotable in pandas

>>> out
Year  2021                   2022              2023            
Month  Feb Mar  Sep  Oct Dec  Jan Jun Aug  Oct  Jun Sep Nov Dec
ID                                                             
1        0   8  1.5  6.5   6    8   8   2  7.0    9   9   3   0
2        4   4  0.0  0.0   0    0   0   2  8.5    0   0   0   3

Code:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
months = pd.CategoricalDtype(months, ordered=True)

rng = np.random.default_rng(2023)
df = pd.DataFrame({'ID': rng.integers(1, 3, 20),
                   'Year': rng.integers(2021, 2024, 20),
                   'Month': rng.choice(months.categories, 20),
                   'Value': rng.integers(1, 10, 20)})
out = (df.astype({'Month': months})
        .pivot_table(index='ID', columns=['Year', 'Month'], values='Value',
                     aggfunc='mean', fill_value=0))

Now I would like add a total column after each year:

Year  2021          Total  2022                  Total  2023                  Total
Month  Feb Mar  Sep        Oct Dec  Jan Jun Aug         Oct  Jun Sep Nov Dec
ID                                                             
1        0   8  1.5  9.5   6.5   6    8   8   2   31.5  7.0    9   9   3   0   28
2        4   4  0.0   8    0.0   0    0   0   2   2     8.5    0   0   0   3   11.5

How could I get this? Thanks!

Upvotes: 0

Views: 32

Answers (2)

Corralien
Corralien

Reputation: 120391

Append a new category to months:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec',
          'Total']  # extra columns here before pivot
months = pd.CategoricalDtype(months, ordered=True)

# Pivot here
out = ...

tot = out.groupby(level='Year', axis=1).sum()
tot.columns = pd.MultiIndex.from_product([tot.columns, pd.Index(['Total'], name='Month', dtype=months)])
out = pd.concat([out, tot], axis=1).sort_index(axis=1)

Output:

>>> out

Year  2021                         2022                    2023                  
Month  Feb Mar  Sep  Oct Dec Total  Jan Jun Aug  Oct Total  Jun Sep Nov Dec Total
ID                                                                               
1        0   8  1.5  6.5   6  22.0    8   8   2  7.0  25.0    9   9   3   0  21.0
2        4   4  0.0  0.0   0   8.0    0   0   2  8.5  10.5    0   0   0   3   3.0

Upvotes: 1

Miguel Montes
Miguel Montes

Reputation: 154

You can do:

df_year = df.groupby(by = ['Year', 'ID']).sum(numeric_only = True)

Upvotes: 0

Related Questions