scarface du ghetto
scarface du ghetto

Reputation: 1

How to reshape multi index in a pandas dataframe like an excel pivot table

I have a dataframe where I have a multindexing of 2 or 3 levels and I would like to reshape it as the usual pivot table in Excel to be able to do 'inter' Grand Total (see the image). Excel Pivot table

I tried with the df.pivot_table() and the multiindexing through .groupby() but nothing conclusive

I only have that DataFrame

DataFrame

Here is the code

 df = pd.DataFrame({'Products': ['Products A','Products A', 
                           'Products A','Products B', 'Products B', 
                           'Products A', 'Products B', 'Products A'],

                   'Sub Products': ['Phone A','Phone B', 
                                   'Laptop B','Phone B', 'Laptop 
                                    B','Phone A','Phone B','Laptop A'],

                   'Color' : ['Green',  'Blue','Red',
                            'Red','Red','Blue','Green','Blue']})


df.groupby(['Products','Sub Products','Color' ]).count()

If you have any idea it would be super helpful ! Thanks.

Upvotes: 0

Views: 258

Answers (1)

evces
evces

Reputation: 402

In pandas you typically wouldn't include this aggregate information as part of the same grouped DataFrame, you would get it with a separate command afterwards, for instance:

grand_total = df.sum()

Note that the data you supplied in your question doesn't quite produce what you have in your images. The numbers differ and some of the A/B labeling was inconsistent. Below I edited the code you provided reproduce something that matches your images, assuming each row of the example data you supplied is one "unit".

df = pd.DataFrame(
    {
        "Products": [
            "Products A",
            "Products A",
            "Products A",
            "Products B",
            "Products B",
            "Products A",
            "Products B",
            "Products A",
        ],
        "Sub Products": [
            "Phone A",
            "Phone A",
            "Laptop A",
            "Phone B",
            "Laptop  B",
            "Phone A",
            "Phone B",
            "Laptop A",
        ],
        "Color": ["Green", "Blue", "Red", "Red", "Red", "Blue", "Green", "Blue"],
    }
)
df['Count'] = 1
df = df.groupby(['Products','Sub Products','Color' ]).sum()

# To view the totals at any particular level of the multi-index
display(df.groupby(level=0)['Count'].sum())
display(df.groupby(level=1)['Count'].sum())
display(df.groupby(level=2)['Count'].sum())

This will give you the information you want... However from your comment, it sounds like you just want a particular display format. This is possible with the code below, but it loses the actual organization of the multi-indexed Data Frame:

out = pd.DataFrame(columns=['Product','Count'])
for n1, d1 in df.groupby(level=0):
  out = pd.concat([out,pd.DataFrame({"Product": n1, "Count": d1.sum().values})])
  d1 = d1.droplevel(0)
  for n2, d2 in d1.groupby(level=0):
    out = pd.concat([out,pd.DataFrame({"Product": n2, "Count": d2.sum().values})])
    d2 = d2.droplevel(0)
    for n3, d3 in d2.groupby(level=0):
      out = pd.concat([out,pd.DataFrame({"Product": n3, "Count": d3.sum().values})])
display(out)

yields:

Product Count
0   Products A  5
0   Laptop A    2
0   Blue    1
0   Red 1
0   Phone A 3
0   Blue    2
0   Green   1
0   Products B  3
0   Laptop  B   1
0   Red 1
0   Phone B 2
0   Green   1
0   Red 1

even better, here is the recursive version of the above:

# recursive version for arbitrarily deep multi-index
def traverse_multiindex(d, out):
  for n1, d1 in d.groupby(level=0):
    out = pd.concat([out,pd.DataFrame({"Product": n1, "Count": d1.sum().values})])
    if (d1.index.nlevels>1):
      d2 = d1.droplevel(0)
      out = traverse_multiindex(d2, out)
  return out

# initialize empty
out = pd.DataFrame(columns=['Product','Count'])
out = traverse_multiindex(df, out)
display(out)

Upvotes: 1

Related Questions