Reputation: 138
test.py:
import pandas as pd
import numpy as np
df = pd.read_excel("C:\\Users\\Admin\\Desktop\\Horticulture\\test_data.xlsx")
df_pivot = pd.pivot_table(
df,
index=["sizeoforchardholding", "nooforchardists"],
columns="fruit_plant_kind",
values=["sizeoforchardholding", "nooforchardists"],
aggfunc={"fruit_plant_count": sum},
fill_value=0,
margins=True,
).reset_index()
print(df_pivot)
result_pivot = df_pivot.to_dict("r")
print(result_pivot)
error:
Traceback (most recent call last):
File "d:\Code\pandas test\test.py", line 5, in <module>
df_pivot = pd.pivot_table(
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 95, in
pivot_table
table = __internal_pivot_table(
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 228, in __internal_pivot_table
table = _add_margins(
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 272, in _add_margins
grand_margin = _compute_grand_margin(data, values, aggfunc, margins_name)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\pivot.py", line 342, in _compute_grand_margin
if isinstance(aggfunc[k], str):
KeyError: 'sizeoforchardholding'
this is the sample output of the data in thexlsx
file:
print (df)
fruit_plant_kind sizeoforchardholding nooforchardists fruit_plant_count
1 Apple 0.5 to 1.0 Acre 2 6
2 Almond 0.5 to 1.0 Acre 2 500
3 Almond 1.0 to 1.5 Acre 1 300
4 Aonla 1.0 to 1.5 Acre 1 250
So what I want the output to be is to index the data according to the sizeoforchardholding
and nooforchardists
and the column to be the fruit_plant_kind
and the value should be the sum of fruit_plant_kind
according to these two indexes
eg:
sizeoforchardholding nooforchardists fruit_plant_count
fruit_plant_kind Almond Aonla Apple Apple seedling Pear Plum Star fruit
0 0.5 to 1.0 Acre 1 0 0 800 0 0 400 0
1 0.5 to 1.0 Acre 2 506 0 550 0 0 0 0
2 0.5 to 1.0 Acre 3 0 0 235 0 0 0 0
As you can see above in 0.5 to 1.0 Acre
with nooforchardists=2
the Almonds = 506
is clearly the sum of the base table.
but now what I want is to get the sum of all the columns at the end. for example the sum of Apples=1585
according to the data above.
Upvotes: 0
Views: 290
Reputation: 862801
You can use fruit_plant_count
for parameter values
instead ["sizeoforchardholding", "nooforchardists"]
:
df_pivot = pd.pivot_table(df,
index=["sizeoforchardholding","nooforchardists"],
columns="fruit_plant_kind",
values="fruit_plant_count",
aggfunc='sum',
margins=True,
margins_name='Sum',
fill_value=0).reset_index().rename_axis(None, axis=1)
print (df_pivot)
sizeoforchardholding nooforchardists Almond Aonla Apple Sum
0 0.5 to 1.0 Acre 2 500 0 6 506
1 1.0 to 1.5 Acre 1 300 250 0 550
2 Sum 800 250 6 1056
Upvotes: 1