masaya
masaya

Reputation: 490

Easier way to plot multiple Relative Frequencies

Plotting multiple relative frequencies (sum of bin to be one, not area of bin to be one) was not easier than I thought.

In Method A, We can use weights argument and plotted properly, but it is not intuitive.

import numpy as np
import pandas as pd

df_a = pd.DataFrame(np.random.randn(1000),columns=['a'])
df_b = pd.DataFrame(1+ np.random.randn(100),columns=['b'])

# Method A
ax = df_a.plot(kind='hist', weights= np.ones_like(df_a) / len(df_a),alpha=0.5)
df_b.plot(kind='hist', weights= np.ones_like(df_b) / len(df_b),alpha=0.5 ,ax= ax )
plt.title("Method A")
plt.show()

MethodA

In Method B, the part for determining relative frequencies count_a/sum(count_a) is easy to understand, but the diagram is not beautiful.

# Method B
count_a,bins_a = np.histogram(df_a.a)
count_b,bins_b = np.histogram(df_b.b)
plt.bar(bins_a[:-1],count_a/sum(count_a),alpha=0.5 )
plt.bar(bins_b[:-1],count_b/sum(count_b),alpha=0.5 )
plt.title("Method B")

Method B

Is there another way to get a graph directly from the data without doing the calculations myself?

Upvotes: 0

Views: 298

Answers (1)

baccandr
baccandr

Reputation: 1130

The problem with your bar plot is that the width is fixed by default to 0.8. This can easily be adjusted to account for the real width of your histogram:

plt.bar(bins_a[:-1], count_a/sum(count_a), width = bins_a[1:] - bins_a[:-1], alpha = 0.5, align = 'edge')

and this is the result: enter image description here

In this example the bin width is fixed but by providing a sequence you have a more flexible option, which can be used also in the case of variable bin sizes.

A different option is to use seaborn as suggested in the comment:

import seaborn as sns    
df_hist = pd.concat([df_a, df_b]).melt()
sns.histplot(data = df_hist, x = 'value', hue = 'variable', stat = 'probability', common_norm = False)

enter image description here

Upvotes: 1

Related Questions