user2309282
user2309282

Reputation: 121

Percent of total column after groupby

Attempting to summarize a pandas dataframe and calculate a "percent of total" column on the result of a groupby of the original df.

Original df:

        Shape_Area                       LU
0  91254232.781776          Fallow Cropland
1    522096.071094  Mixed Wetland Hardwoods
2     87795.467187  Mixed Wetland Hardwoods
3       440.528367  Mixed Wetland Hardwoods
4    778952.154436         Dikes and Levees

Grouped result:

                              Shape_Area
LU                                      
Dikes and Levees           778952.154436
Fallow Cropland          91254232.781776
Mixed Wetland Hardwoods    610332.066649

I'd like to add an additional "PCT of Total" column for each LU type. I'm not sure I'm accessing the groupby result correctly, probably not understanding what it is (a series?).

df = pd.DataFrame(narr, columns=['LU','Shape_Area'])
df = df.groupby(['LU'])[['Shape_Area']].sum()

#to print the example above after groupby
print df

Upvotes: 0

Views: 78

Answers (1)

Brendan
Brendan

Reputation: 4011

You can simply calculate the sum of the Shape_Area series (which returns a scalar), and divide each row of Shape_Area in the grouped dataframe by this value.

grouped = df.groupby(['LU'])[['Shape_Area']].sum()
grouped['pct'] = grouped['Shape_Area'] / grouped['Shape_Area'].sum()
                           Shape_Area       pct
LU                                             
Dikes and Levees         7.789522e+05  0.008408
Fallow Cropland          9.125423e+07  0.985004
Mixed Wetland Hardwoods  6.103321e+05  0.006588

Upvotes: 2

Related Questions