skwolvie
skwolvie

Reputation: 139

How to plot proportions of datapoints using seaborn python

I have a plot created using the code and the result as shown below:

Code:

%matplotlib inline
sns.histplot(x = 'time_class', hue = 'Det_poiResult',data = df, multiple="dodge", shrink=.8)

Result: enter image description here

I want to convert this plot to show the proportions of the results.

  1. what is the result of hue (win, loss, draw) in proportions computed for each category on x. (EX: win as a fraction of all the data points in class blitz)
  2. what is the result in proportions of hue (computed by taking count of hue as a fraction of all the data points of all classes on x)

should I modify my data frame to compute results of proportions and then plot it or is there any easy way of doing this using built-in classes?

Anticipated result-1

on Y-axis I have proportions computed for each category of X instead of counts. 

proportions computation:
for the blitz on X.
  win= count(wins)/count(blitz)
  loss= count(loss)/count(blitz)
  draw= count(draw)/count(blitz)....

Anticipated result-2

on Y-axis I have proportions computed as a fraction of datapoints in the entire dataset instead of counts. 

proportions computation:
for the blitz on X.
  win= count(wins)/count(all datapoints of df)
  loss= count(loss)/count(all datapoints of df)
  draw= count(draw)/count(all datapoints of df)....

Upvotes: 0

Views: 1477

Answers (1)

JohanC
JohanC

Reputation: 80289

You can use pandas' groupby to calculate the sums, and from there the percentages. Then, sns.barplot can create the plot.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame({'time_class': np.random.choice(['blitz', 'rapid', 'bullit'], 5000, p=[.6, .1, .3]),
                   'Det_poiResult': np.random.choice(['win', 'loss', 'draw'], 5000, p=[.49, .48, .03])})
df_counts = df.groupby(['time_class', 'Det_poiResult']).size()

df_pcts1 = (df_counts.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))).to_frame(name="Percent").reset_index()
df_pcts2 = (df_counts.groupby(level=1).sum() * 100 / len(df)).to_frame(name="Percent").reset_index()
df_pcts2['time_class'] = "overall"

ax = sns.barplot(data=df_pcts1.append(df_pcts2), y='Percent',
                 x='time_class', order=['blitz', 'rapid', 'bullit', 'overall'],
                 hue='Det_poiResult', hue_order=['win', 'loss', 'draw'], palette=['dodgerblue', 'tomato', 'chartreuse'])
ax.legend(loc='upper left', bbox_to_anchor=(1.02, 1.02))
ax.axvline(2.5, ls='--', color='0.4')
plt.tight_layout()
plt.show()

sns.barplot from aggregated dataframes

Upvotes: 1

Related Questions