Reputation: 89
I have gone from different posts on this forum, but I cannot find an answer to the behaviour I am seeing.
I have a csv file which header has many entries with 300 points each. For each fiel (column of the csv file) I would like to plot an histogram. The x axis contains the elements on that column and the y-axis should have the number of samples that fall inside each bin. As I have 300 points, the total number of samples in all bins added together should be 300, so the y-axis should go from 0 to, let's say, 50 (just an example). However, the values are gigantic (400e8), which makes not sense.
1 | 250.23e-9 2 | 250.123e-9 ... | ... 300 | 251.34e-9
Please check my code, below. I am using pandas to open the csv and Matplotlib for the rest.
df=pd.read_csv("/home/pcardoso/raw_data/myData.csv")
# Figure parameters
figPath='/home/pcardoso/scripts/python/matplotlib/figures/'
figPrefix='hist_' # Prefix to the name of the file.
figSuffix='_something' # Suffix to the name of the file.
figString='' # Full string passed as the figure name to be saved
precision=3
num_bins = 50
columns=list(df)
for fieldName in columns:
vectorData=df[fieldName]
# statistical data
mu = np.mean(vectorData) # mean of distribution
sigma = np.std(vectorData) # standard deviation of distribution
# Create plot instance
fig, ax = plt.subplots()
# Histogram
n, bins, patches = ax.hist(vectorData, num_bins, density='True',alpha=0.75,rwidth=0.9, label=fieldName)
ax.legend()
# Best-fit curve
y=mlab.normpdf(bins, mu, sigma)
ax.plot(bins, y, '--')
# Setting axis names, grid and title
ax.set_xlabel(fieldName)
ax.set_ylabel('Number of points')
ax.set_title(fieldName + ': $\mu=$' + eng_notation(mu,precision) + ', $\sigma=$' + eng_notation(sigma,precision))
ax.grid(True, alpha=0.2)
fig.tight_layout() # Tweak spacing to prevent clipping of ylabel
# Saving figure
figString=figPrefix + fieldName +figSuffix
fig.savefig(figPath + figString)
plt.show()
plt.close(fig)
In summary, I would like to know how to have the y-axis values right.
Edit: 6 July 2020
Edit 08 June 2020 I would like the density estimator to follow the plot like this:
Upvotes: 0
Views: 1112
Reputation: 1914
Don't use density='True'
, as with that option, the value displayed is the members in the bin divided by the width of the bin. If that width is small (as in your case of rather small x
-values, the values become large.
Edit: Ok, to un-norm the normed curve, you need to multiply it with the number of points and the width of one bin. I made a more reduced example:
from numpy.random import normal
from scipy.stats import norm
import pylab
N = 300
sigma = 10.0
B = 30
def main():
x = normal(0, sigma, N)
h, bins, _ = pylab.hist(x, bins=B, rwidth=0.8)
bin_width = bins[1] - bins[0]
h_n = norm.pdf(bins[:-1], 0, sigma) * N * bin_width
pylab.plot(bins[:-1], h_n)
if __name__ == "__main__":
main()
Upvotes: 1