magnustm1
magnustm1

Reputation: 25

Pandas subtotal similar to Excel

I have the following dataframe df:

     A   B      C
0    21  Blue   100 
1    33  Yellow 100 
2    17  White  250 
3    A2  Grey   40
4    65  Green  500 
5    33  Red    80 
6    17  Purple -50
7    A2  Orange 600

Column B is basically irrelevant information IRT the code itself but still needs to be included in the output. I have sorted the dataframe by column A and gotten around the issue that col A has contains both int and str:

df['A'] = df['A'].astype(str)
df_sorted = df.sort_values(by=['A'])

So now df_sorted looks like this:

     A   B      C
2    17  White  250
6    17  Purple -50
0    21  Blue   100
1    33  Yellow 100
5    33  Red    80
4    65  Green  500
3    A2  Grey   40
7    A2  Orange 600

My question is: How can I then make subtotals for each change in col A by summarizing col C similar to Excel's subtotal function? The final output of the dataframe should look like this:

     A        B      C
2    17       White  250
6    17       Purple -50
     Subtotal        200  
0    21       Blue   100
     Subtotal        100
1    33       Yellow 100
5    33       Red    80
     Subtotal        180
4    65       Green  500
     Subtotal        500
3    A2       Grey   40
7    A2       Orange 600
     Subtotal        640

Upvotes: 1

Views: 1512

Answers (2)

Bitwise Alex
Bitwise Alex

Reputation: 1

Though Umar's answer would work for this, a problem can arise if you have NA values in the B column.

Basically that solution adds a set of rows at the end, with column A, NA, and the sum of C. Sorts by column A, then replaces the values of A with "Subtotal" where B is NA.
So if there's already an NA value in column B, it will be renamed to "Subtotal" as well.

Another way to do this would be to split the dataframe into groups, append the subtotal to the end, and then combine the groups back together.

groups = df.groupby("A")
subtotals = []

for key, group in groups:
    subtotal_row = {"A": "Subtotal", "C": group["C"].sum()}
    subtotals.append(pd.concat([group, pd.DataFrame([subtotal_row])]))

df = pd.concat(subtotals)

Upvotes: 0

Umar.H
Umar.H

Reputation: 23099

you can concat your original df and the groupby subtotal.

df1 =  pd.concat([df,
             df.groupby(['A'],as_index=False)['C'].sum()]).sort_values('A')

df1.loc[df1['B'].isnull(), 'A'] = 'Subtotal'

print(df1.fillna(''))

          A       B    C
2        17   White  250
6        17  Purple  -50
0  Subtotal          200
0        21    Blue  100
1  Subtotal          100
1        33  Yellow  100
5        33     Red   80
2  Subtotal          180
4        65   Green  500
3  Subtotal          500
3        A2    Grey   40
7        A2  Orange  600
4  Subtotal          640

Upvotes: 4

Related Questions