Sahil Tariq
Sahil Tariq

Reputation: 138

How to get the sum of a DataFrame in Pandas using pivot_table function

test.py:

import pandas as pd
import numpy as np

df = pd.read_excel("C:\\Users\\Admin\\Desktop\\Horticulture\\test_data.xlsx")
df_pivot = pd.pivot_table(
    df,
    index=["sizeoforchardholding", "nooforchardists"],
    columns="fruit_plant_kind",
    values=["sizeoforchardholding", "nooforchardists"],
    aggfunc={"fruit_plant_count": sum},
    fill_value=0,
    margins=True,
).reset_index()
print(df_pivot)
result_pivot = df_pivot.to_dict("r")
print(result_pivot)

error:

Traceback (most recent call last):
  File "d:\Code\pandas test\test.py", line 5, in <module>
    df_pivot = pd.pivot_table(
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 95, in 
pivot_table
    table = __internal_pivot_table(
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 228, in __internal_pivot_table
    table = _add_margins(
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 272, in _add_margins
    grand_margin = _compute_grand_margin(data, values, aggfunc, margins_name)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 342, in _compute_grand_margin
    if isinstance(aggfunc[k], str):
KeyError: 'sizeoforchardholding'

this is the sample output of the data in thexlsx file:

print (df)
  fruit_plant_kind sizeoforchardholding  nooforchardists  fruit_plant_count
1            Apple      0.5 to 1.0 Acre                2                  6
2           Almond      0.5 to 1.0 Acre                2                500
3           Almond      1.0 to 1.5 Acre                1                300
4            Aonla      1.0 to 1.5 Acre                1                250

So what I want the output to be is to index the data according to the sizeoforchardholding and nooforchardists and the column to be the fruit_plant_kind and the value should be the sum of fruit_plant_kind according to these two indexes

eg:

                 sizeoforchardholding nooforchardists fruit_plant_count
fruit_plant_kind                                                 Almond Aonla  Apple Apple seedling Pear Plum Star fruit    
0                     0.5 to 1.0 Acre               1                 0     0    800              0    0  400          0    
1                     0.5 to 1.0 Acre               2               506     0    550              0    0    0          0    
2                     0.5 to 1.0 Acre               3                 0     0    235              0    0    0          0    

As you can see above in 0.5 to 1.0 Acre with nooforchardists=2 the Almonds = 506 is clearly the sum of the base table.

but now what I want is to get the sum of all the columns at the end. for example the sum of Apples=1585 according to the data above.

Upvotes: 0

Views: 290

Answers (1)

jezrael
jezrael

Reputation: 862801

You can use fruit_plant_count for parameter values instead ["sizeoforchardholding", "nooforchardists"]:

df_pivot = pd.pivot_table(df,                    
                          index=["sizeoforchardholding","nooforchardists"],
                          columns="fruit_plant_kind",                     
                          values="fruit_plant_count",                     
                          aggfunc='sum', 
                          margins=True, 
                          margins_name='Sum',                     
                          fill_value=0).reset_index().rename_axis(None, axis=1)
print (df_pivot)
  sizeoforchardholding nooforchardists  Almond  Aonla  Apple   Sum
0      0.5 to 1.0 Acre               2     500      0      6   506
1      1.0 to 1.5 Acre               1     300    250      0   550
2                  Sum                     800    250      6  1056

Upvotes: 1

Related Questions