Reputation: 490
Plotting multiple relative frequencies (sum of bin to be one, not area of bin to be one) was not easier than I thought.
In Method A, We can use weights argument and plotted properly, but it is not intuitive.
import numpy as np
import pandas as pd
df_a = pd.DataFrame(np.random.randn(1000),columns=['a'])
df_b = pd.DataFrame(1+ np.random.randn(100),columns=['b'])
# Method A
ax = df_a.plot(kind='hist', weights= np.ones_like(df_a) / len(df_a),alpha=0.5)
df_b.plot(kind='hist', weights= np.ones_like(df_b) / len(df_b),alpha=0.5 ,ax= ax )
plt.title("Method A")
plt.show()
In Method B, the part for determining relative frequencies count_a/sum(count_a)
is easy to understand, but the diagram is not beautiful.
# Method B
count_a,bins_a = np.histogram(df_a.a)
count_b,bins_b = np.histogram(df_b.b)
plt.bar(bins_a[:-1],count_a/sum(count_a),alpha=0.5 )
plt.bar(bins_b[:-1],count_b/sum(count_b),alpha=0.5 )
plt.title("Method B")
Is there another way to get a graph directly from the data without doing the calculations myself?
Upvotes: 0
Views: 298
Reputation: 1130
The problem with your bar plot is that the width is fixed by default to 0.8. This can easily be adjusted to account for the real width of your histogram:
plt.bar(bins_a[:-1], count_a/sum(count_a), width = bins_a[1:] - bins_a[:-1], alpha = 0.5, align = 'edge')
In this example the bin width is fixed but by providing a sequence you have a more flexible option, which can be used also in the case of variable bin sizes.
A different option is to use seaborn as suggested in the comment:
import seaborn as sns
df_hist = pd.concat([df_a, df_b]).melt()
sns.histplot(data = df_hist, x = 'value', hue = 'variable', stat = 'probability', common_norm = False)
Upvotes: 1