ah bon
ah bon

Reputation: 10041

Groupby, value counts and calculate percentage in Pandas

I have groupby state, value counts industry of a dataframe.

df.loc[df['state'].isin(['Alabama','Arizona'])].groupby(df['state'])['industry'].value_counts(sort = True)

Out:

state    industry                              
Alabama  Financial Services                        224
         Education                                   7
         Healthcare, Pharmaceuticals, & Biotech      5
         Business Services                           2
         Other                                       2
         Retail                                      2
         Government                                  1
         Manufacturing                               1
         Transportation & Storage                    1
Arizona  Healthcare, Pharmaceuticals, & Biotech     19
         Other                                      13
         Education                                   5
         Retail                                      5
         Transportation & Storage                    5
         Manufacturing                               4
         Travel, Recreation, and Leisure             4
         Consumer Services                           3
         Energy & Utilities                          2
         Financial Services                          2
         Government                                  2
         Business Services                           1
         Computers & Electronics                     1
         Software & Internet                         1
Name: industry, dtype: int64

Now I would like to go further, get percentage of value counts, for example, for Alabama, I want to know the percentage of Financial Services, which is calculated by 224/ (224 + 7 + ... + 1), etc.

How could I do that by using new code or modify the code above? Thanks.

Upvotes: 1

Views: 658

Answers (1)

BENY
BENY

Reputation: 323306

Adding normalize

df.loc[df['state'].isin(['Alabama','Arizona'])].groupby(df['state'])['industry'].value_counts(sort = True, normalize=True)

Upvotes: 2

Related Questions