user3447653
user3447653

Reputation: 4148

Value counts of 2 columns in a pandas dataframe

I have a table in the below format. I would like to do the value counts of both the columns (year and operation) and get their percentage.

For example, in year 2014, value yes appears 2 out of 3 times, hence 2/3 = 0.66. I tried with value_counts but it did not yield the below results. Any leads would be appreciated.

df[['year', 'operation']].apply(pd.Series.value_counts)

year operation
2014    yes
2014    yes
2014    no
2015    
2015    yes
2015    yes

Result:

2014   yes     0.66
2014   no      0.33
2015           0.33
2015   yes     0.66

Upvotes: 2

Views: 5842

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35636

Let's try with SeriesGroupBy.value_counts and set normalize=True to get the values as a percentage:

out = df.groupby('year')['operation'].value_counts(normalize=True)

out:

year  operation
2014  yes          0.666667
      no           0.333333
2015  yes          0.666667
                   0.333333
Name: operation, dtype: float64

Can also set sort=False to not sort with highest value per level 0:

out = df.groupby('year')['operation'].value_counts(normalize=True, sort=False)

out:

year  operation
2014  no           0.333333
      yes          0.666667
2015               0.333333
      yes          0.666667
Name: operation, dtype: float64

Series.reset_index can be used with name= set to create a DataFrame instead of a Series and give a name to the unnamed values column:

new_df = (
    df.groupby('year')['operation'].value_counts(normalize=True)
        .reset_index(name='freq')
)
   year operation      freq
0  2014       yes  0.666667
1  2014        no  0.333333
2  2015       yes  0.666667
3  2015            0.333333

DataFrame Used:

df = pd.DataFrame({'year': [2014, 2014, 2014, 2015, 2015, 2015],
                   'operation': ['yes', 'yes', 'no', '', 'yes', 'yes']})

Upvotes: 6

Related Questions