Reputation: 5
I have a frequency table
and I have been trying to plot this data into something like this,
histogram with distribution curve
so tried this,
to_plot = compare_df[['counts', 'theoritical counts']]
bins=[0,2500,5000,7500,10000,12500,15000,17500,20000]
sns.displot(to_plot,bins=bins)
but, it turned out to be like this, plot
Any idea what I did wrong? Please help.
Upvotes: 0
Views: 2218
Reputation: 80349
First off, note that you lose important information when creating a kde plot only from frequencies.
sns.histplot()
has a parameter weights=
which can handle the frequencies. I didn't see a way to get this to work using a long dataframe and hue
, but you can call histplot
separately for each column. Here is an example starting from generated data:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set()
bins = np.array([0, 2500, 5000, 7500, 10000, 12500, 15000, 17500, 20000])
df = pd.DataFrame({'counts': np.random.randint(2, 30, 8),
'theoretical counts': np.random.randint(2, 30, 8)},
index=pd.interval_range(0, 20000, freq=2500))
df['theoretical counts'] = (3 * df['counts'] + df['theoretical counts']) // 4
fig, ax = plt.subplots()
for column, color in zip(['counts', 'theoretical counts'], ['cornflowerblue', 'crimson']):
sns.histplot(x=(bins[:-1] + bins[1:]) / 2, weights=df[column], bins=8, binrange=(0, 20000),
kde=True, kde_kws={'cut': .3},
color=color, alpha=0.5, label=column, ax=ax)
ax.legend()
ax.set_xticks(range(0, 20001, 2500))
plt.show()
With very varying bin width, there isn't enough information for a suitable kde curve. Also, a bar plot seems more appropriate then a histogram. Here is an exmple:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set()
bins = [0, 250, 500, 1000, 1500, 2500, 5000, 10000, 50000, np.inf]
bin_labels = [f'{b0}-{b1}' for b0, b1, in zip(bins[:-1], bins[1:])]
df = pd.DataFrame({'counts': np.random.randint(2, 30, 9),
'theoretical counts': np.random.randint(2, 30, 9)})
df['theoretical counts'] = (3 * df['counts'] + df['theoretical counts']) // 4
fig, ax = plt.subplots(figsize=(10, 4))
sns.barplot(data=df.melt(), x=np.tile(bin_labels, 2), y='value',
hue='variable', palette=['cornflowerblue', 'crimson'], ax=ax)
plt.tight_layout()
plt.show()
sns.barplot()
has some options, for example dodge=False, alpha=0.5
to draw the bars at the same spot.
Upvotes: 1
Reputation: 169374
Couple of things:
when you provide a DataFrame to sns.displot
you would need also to specify which column to use for the distribution as the x
kwarg.
this leads into the 2nd issue: I don't know of a way to get multiple distributions using sns.displot
, but you can use sns.histplot
in approximately this way:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset('titanic')
ax = sns.histplot(data=titanic,x='age',bins=30,color='r',alpha=.25,
label='age')
sns.histplot(data=titanic,x='fare',ax=ax,bins=30,color='b',alpha=.25,
label='fare')
ax.legend()
plt.show()
Result below, and please note that I just used an example dataset to get you a rough image as quickly as possible:
Upvotes: 0