Reputation: 121
Attempting to summarize a pandas dataframe and calculate a "percent of total" column on the result of a groupby of the original df.
Original df:
Shape_Area LU
0 91254232.781776 Fallow Cropland
1 522096.071094 Mixed Wetland Hardwoods
2 87795.467187 Mixed Wetland Hardwoods
3 440.528367 Mixed Wetland Hardwoods
4 778952.154436 Dikes and Levees
Grouped result:
Shape_Area
LU
Dikes and Levees 778952.154436
Fallow Cropland 91254232.781776
Mixed Wetland Hardwoods 610332.066649
I'd like to add an additional "PCT of Total" column for each LU type. I'm not sure I'm accessing the groupby result correctly, probably not understanding what it is (a series?).
df = pd.DataFrame(narr, columns=['LU','Shape_Area'])
df = df.groupby(['LU'])[['Shape_Area']].sum()
#to print the example above after groupby
print df
Upvotes: 0
Views: 78
Reputation: 4011
You can simply calculate the sum of the Shape_Area
series (which returns a scalar), and divide each row of Shape_Area
in the grouped dataframe by this value.
grouped = df.groupby(['LU'])[['Shape_Area']].sum()
grouped['pct'] = grouped['Shape_Area'] / grouped['Shape_Area'].sum()
Shape_Area pct
LU
Dikes and Levees 7.789522e+05 0.008408
Fallow Cropland 9.125423e+07 0.985004
Mixed Wetland Hardwoods 6.103321e+05 0.006588
Upvotes: 2