Reputation: 1
I have a dataframe where I have a multindexing of 2 or 3 levels and I would like to reshape it as the usual pivot table in Excel to be able to do 'inter' Grand Total (see the image).
I tried with the df.pivot_table()
and the multiindexing through .groupby()
but nothing conclusive
I only have that DataFrame
Here is the code
df = pd.DataFrame({'Products': ['Products A','Products A',
'Products A','Products B', 'Products B',
'Products A', 'Products B', 'Products A'],
'Sub Products': ['Phone A','Phone B',
'Laptop B','Phone B', 'Laptop
B','Phone A','Phone B','Laptop A'],
'Color' : ['Green', 'Blue','Red',
'Red','Red','Blue','Green','Blue']})
df.groupby(['Products','Sub Products','Color' ]).count()
If you have any idea it would be super helpful ! Thanks.
Upvotes: 0
Views: 258
Reputation: 402
In pandas you typically wouldn't include this aggregate information as part of the same grouped DataFrame, you would get it with a separate command afterwards, for instance:
grand_total = df.sum()
Note that the data you supplied in your question doesn't quite produce what you have in your images. The numbers differ and some of the A/B labeling was inconsistent. Below I edited the code you provided reproduce something that matches your images, assuming each row of the example data you supplied is one "unit".
df = pd.DataFrame(
{
"Products": [
"Products A",
"Products A",
"Products A",
"Products B",
"Products B",
"Products A",
"Products B",
"Products A",
],
"Sub Products": [
"Phone A",
"Phone A",
"Laptop A",
"Phone B",
"Laptop B",
"Phone A",
"Phone B",
"Laptop A",
],
"Color": ["Green", "Blue", "Red", "Red", "Red", "Blue", "Green", "Blue"],
}
)
df['Count'] = 1
df = df.groupby(['Products','Sub Products','Color' ]).sum()
# To view the totals at any particular level of the multi-index
display(df.groupby(level=0)['Count'].sum())
display(df.groupby(level=1)['Count'].sum())
display(df.groupby(level=2)['Count'].sum())
This will give you the information you want... However from your comment, it sounds like you just want a particular display format. This is possible with the code below, but it loses the actual organization of the multi-indexed Data Frame:
out = pd.DataFrame(columns=['Product','Count'])
for n1, d1 in df.groupby(level=0):
out = pd.concat([out,pd.DataFrame({"Product": n1, "Count": d1.sum().values})])
d1 = d1.droplevel(0)
for n2, d2 in d1.groupby(level=0):
out = pd.concat([out,pd.DataFrame({"Product": n2, "Count": d2.sum().values})])
d2 = d2.droplevel(0)
for n3, d3 in d2.groupby(level=0):
out = pd.concat([out,pd.DataFrame({"Product": n3, "Count": d3.sum().values})])
display(out)
yields:
Product Count
0 Products A 5
0 Laptop A 2
0 Blue 1
0 Red 1
0 Phone A 3
0 Blue 2
0 Green 1
0 Products B 3
0 Laptop B 1
0 Red 1
0 Phone B 2
0 Green 1
0 Red 1
even better, here is the recursive version of the above:
# recursive version for arbitrarily deep multi-index
def traverse_multiindex(d, out):
for n1, d1 in d.groupby(level=0):
out = pd.concat([out,pd.DataFrame({"Product": n1, "Count": d1.sum().values})])
if (d1.index.nlevels>1):
d2 = d1.droplevel(0)
out = traverse_multiindex(d2, out)
return out
# initialize empty
out = pd.DataFrame(columns=['Product','Count'])
out = traverse_multiindex(df, out)
display(out)
Upvotes: 1