gushady
gushady

Reputation: 5

How to plot histogram and distribution from frequency table?

I have a frequency table

frequency table

and I have been trying to plot this data into something like this,

histogram with distribution curve

so tried this,

to_plot = compare_df[['counts', 'theoritical counts']]
bins=[0,2500,5000,7500,10000,12500,15000,17500,20000]
sns.displot(to_plot,bins=bins)

but, it turned out to be like this, plot

Any idea what I did wrong? Please help.

Upvotes: 0

Views: 2218

Answers (2)

JohanC
JohanC

Reputation: 80349

First off, note that you lose important information when creating a kde plot only from frequencies.

sns.histplot() has a parameter weights= which can handle the frequencies. I didn't see a way to get this to work using a long dataframe and hue, but you can call histplot separately for each column. Here is an example starting from generated data:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

sns.set()
bins = np.array([0, 2500, 5000, 7500, 10000, 12500, 15000, 17500, 20000])
df = pd.DataFrame({'counts': np.random.randint(2, 30, 8),
                   'theoretical counts': np.random.randint(2, 30, 8)},
                  index=pd.interval_range(0, 20000, freq=2500))
df['theoretical counts'] = (3 * df['counts'] + df['theoretical counts']) // 4
fig, ax = plt.subplots()
for column, color in zip(['counts', 'theoretical counts'], ['cornflowerblue', 'crimson']):
    sns.histplot(x=(bins[:-1] + bins[1:]) / 2, weights=df[column], bins=8, binrange=(0, 20000),
                 kde=True, kde_kws={'cut': .3},
                 color=color, alpha=0.5, label=column, ax=ax)
ax.legend()
ax.set_xticks(range(0, 20001, 2500))
plt.show()

sns.histplot from frequencies

With very varying bin width, there isn't enough information for a suitable kde curve. Also, a bar plot seems more appropriate then a histogram. Here is an exmple:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

sns.set()
bins = [0, 250, 500, 1000, 1500, 2500, 5000, 10000, 50000, np.inf]
bin_labels = [f'{b0}-{b1}' for b0, b1, in zip(bins[:-1], bins[1:])]
df = pd.DataFrame({'counts': np.random.randint(2, 30, 9),
                   'theoretical counts': np.random.randint(2, 30, 9)})
df['theoretical counts'] = (3 * df['counts'] + df['theoretical counts']) // 4
fig, ax = plt.subplots(figsize=(10, 4))
sns.barplot(data=df.melt(), x=np.tile(bin_labels, 2), y='value',
            hue='variable', palette=['cornflowerblue', 'crimson'], ax=ax)
plt.tight_layout()
plt.show()

bar plots

sns.barplot() has some options, for example dodge=False, alpha=0.5 to draw the bars at the same spot.

Upvotes: 1

mechanical_meat
mechanical_meat

Reputation: 169374

Couple of things:

  1. when you provide a DataFrame to sns.displot you would need also to specify which column to use for the distribution as the x kwarg.

  2. this leads into the 2nd issue: I don't know of a way to get multiple distributions using sns.displot, but you can use sns.histplot in approximately this way:

import matplotlib.pyplot as plt
import seaborn as sns 

titanic = sns.load_dataset('titanic')

ax = sns.histplot(data=titanic,x='age',bins=30,color='r',alpha=.25,
                  label='age')
sns.histplot(data=titanic,x='fare',ax=ax,bins=30,color='b',alpha=.25,
             label='fare')         
ax.legend()
plt.show()

Result below, and please note that I just used an example dataset to get you a rough image as quickly as possible:

enter image description here

Upvotes: 0

Related Questions