Python chart of natural distribution

Question

I want to graph my data in natural distribution way

Not sure how I do that

I tried using plt.hist but it failed, I only got one column!!

here is my code

import pymssql
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np 



conn = pymssql.connect(server='MyServer', database='MyDB')

df = pd.read_sql('EXEC [Stat_EDFlow] [2018-03-01], [2019-02-28]', conn, index=False)
conn.close()


plt.hist(df['MyColumn'])
plt.show()

gmds · Accepted Answer

The reason for this is the way bins are calculated.

You have some outliers in your data, which is causing the plot to "zoom out" in an effort to show all of them.

One way you can resolve this issue is to remove the outliers (say, everything past the 95th percentile) and specify the number of bins:

df.loc[df['MyColumn'] < df['MyColumn'].quantile(0.95), 'MyColumn']).plot.hist(bins=25)

If this doesn't work, decrease the threshold from 0.95.

Another way is to specify the bins directly:

df['MyColumn'].plot.hist(bins=np.linspace(0, 100, 25))

Python chart of natural distribution

Answers (2)

Related Questions