asmgx
asmgx

Reputation: 8014

Python chart of natural distribution

I want to graph my data in natural distribution way

enter image description here

Not sure how I do that

I tried using plt.hist but it failed, I only got one column!!

enter image description here

here is my code

import pymssql
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np 



conn = pymssql.connect(server='MyServer', database='MyDB')

df = pd.read_sql('EXEC [Stat_EDFlow] [2018-03-01], [2019-02-28]', conn, index=False)
conn.close()


plt.hist(df['MyColumn'])
plt.show()

Upvotes: 0

Views: 65

Answers (2)

Ocean Scientist
Ocean Scientist

Reputation: 411

I think you're looking for the ,bins= keyword. You can either provide an integer of the number of bins you want, or something like np.arange(min,max,dist). https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hist.html

EDIT: To have a line plot, you can use something like:

import matplotlib.pyplot as plt
import numpy as np

synthetic=np.random.normal(size=100)
fig=plt.figure(figsize=(5,5))
y,binEdges=np.histogram(synthetic,bins=20) #we want 20 bins
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
plt.plot(bincenters,y,c='k')

Upvotes: 1

gmds
gmds

Reputation: 19885

The reason for this is the way bins are calculated.

You have some outliers in your data, which is causing the plot to "zoom out" in an effort to show all of them.

One way you can resolve this issue is to remove the outliers (say, everything past the 95th percentile) and specify the number of bins:

df.loc[df['MyColumn'] < df['MyColumn'].quantile(0.95), 'MyColumn']).plot.hist(bins=25)

If this doesn't work, decrease the threshold from 0.95.

Another way is to specify the bins directly:

df['MyColumn'].plot.hist(bins=np.linspace(0, 100, 25))

Upvotes: 1

Related Questions