Plotting histograms in Python using Matplotlib or Pandas

Question

I have gone from different posts on this forum, but I cannot find an answer to the behaviour I am seeing.

I have a csv file which header has many entries with 300 points each. For each fiel (column of the csv file) I would like to plot an histogram. The x axis contains the elements on that column and the y-axis should have the number of samples that fall inside each bin. As I have 300 points, the total number of samples in all bins added together should be 300, so the y-axis should go from 0 to, let's say, 50 (just an example). However, the values are gigantic (400e8), which makes not sense.

sample of the table point mydata

1 | 250.23e-9 2 | 250.123e-9 ... | ... 300 | 251.34e-9

Please check my code, below. I am using pandas to open the csv and Matplotlib for the rest.

df=pd.read_csv("/home/pcardoso/raw_data/myData.csv")

# Figure parameters
figPath='/home/pcardoso/scripts/python/matplotlib/figures/'
figPrefix='hist_'           # Prefix to the name of the file.
figSuffix='_something'      # Suffix to the name of the file.
figString=''    # Full string passed as the figure name to be saved

precision=3
num_bins = 50

columns=list(df)

for fieldName in columns:

    vectorData=df[fieldName]
    
    # statistical data
    mu = np.mean(vectorData)  # mean of distribution
    sigma = np.std(vectorData)  # standard deviation of distribution

    # Create plot instance
    fig, ax = plt.subplots()

    # Histogram
    n, bins, patches = ax.hist(vectorData, num_bins, density='True',alpha=0.75,rwidth=0.9, label=fieldName)
    ax.legend()
    
    # Best-fit curve
    y=mlab.normpdf(bins, mu, sigma)
    ax.plot(bins, y, '--')
    
    # Setting axis names, grid and title
    ax.set_xlabel(fieldName)
    ax.set_ylabel('Number of points')
    ax.set_title(fieldName + ': $\mu=$' + eng_notation(mu,precision) + ', $\sigma=$' + eng_notation(sigma,precision))
    ax.grid(True, alpha=0.2)
    
    fig.tight_layout()      # Tweak spacing to prevent clipping of ylabel
    
    # Saving figure
    figString=figPrefix + fieldName +figSuffix
    fig.savefig(figPath + figString)

plt.show()

plt.close(fig)

In summary, I would like to know how to have the y-axis values right.

Edit: 6 July 2020

Edit 08 June 2020 I would like the density estimator to follow the plot like this:

Dr. V · Accepted Answer

Don't use density='True', as with that option, the value displayed is the members in the bin divided by the width of the bin. If that width is small (as in your case of rather small x-values, the values become large.

Edit: Ok, to un-norm the normed curve, you need to multiply it with the number of points and the width of one bin. I made a more reduced example:

from numpy.random import normal
from scipy.stats import norm
import pylab

N = 300
sigma = 10.0
B = 30

def main():
    x = normal(0, sigma, N)

    h, bins, _ = pylab.hist(x, bins=B, rwidth=0.8)
    bin_width = bins[1] - bins[0]

    h_n = norm.pdf(bins[:-1], 0, sigma) * N * bin_width
    pylab.plot(bins[:-1], h_n)

if __name__ == "__main__":
    main()

Plotting histograms in Python using Matplotlib or Pandas

sample of the table point mydata

Answers (1)

Related Questions