Reputation: 8014
I want to graph my data in natural distribution way
Not sure how I do that
I tried using plt.hist but it failed, I only got one column!!
here is my code
import pymssql
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
conn = pymssql.connect(server='MyServer', database='MyDB')
df = pd.read_sql('EXEC [Stat_EDFlow] [2018-03-01], [2019-02-28]', conn, index=False)
conn.close()
plt.hist(df['MyColumn'])
plt.show()
Upvotes: 0
Views: 65
Reputation: 411
I think you're looking for the ,bins=
keyword.
You can either provide an integer of the number of bins you want, or something like np.arange(min,max,dist).
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hist.html
EDIT: To have a line plot, you can use something like:
import matplotlib.pyplot as plt
import numpy as np
synthetic=np.random.normal(size=100)
fig=plt.figure(figsize=(5,5))
y,binEdges=np.histogram(synthetic,bins=20) #we want 20 bins
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
plt.plot(bincenters,y,c='k')
Upvotes: 1
Reputation: 19885
The reason for this is the way bins are calculated.
You have some outliers in your data, which is causing the plot to "zoom out" in an effort to show all of them.
One way you can resolve this issue is to remove the outliers (say, everything past the 95th percentile) and specify the number of bins:
df.loc[df['MyColumn'] < df['MyColumn'].quantile(0.95), 'MyColumn']).plot.hist(bins=25)
If this doesn't work, decrease the threshold from 0.95.
Another way is to specify the bins directly:
df['MyColumn'].plot.hist(bins=np.linspace(0, 100, 25))
Upvotes: 1