np.histogram producing very small counts in specified bins

Question

Trying to generate a simple histogram using 1% bins and a simple normal distribution but I am getting incredibly small bin counts - where am I messing up the implementation of np.histogram?

Here is the basic implementation:

import streamlit as st
import math
import pandas as pd
import numpy as np
from numpy.random import normal
import random
import matplotlib.pyplot as plt
import plotly.graph_objects as go

mean = 600000
uncertainty = 5.02
st_dev = mean * uncertainty/100

year1_dist = normal(mean, st_dev, 10000)

bin_size = mean * 0.01
nbins = math.ceil((year1_dist.max() - year1_dist.min()) / bin_size)
hist, bin_edges = np.histogram(year1_dist, bins=nbins, density=True)

The values stored in hist are very small (sum to something like 0.00017) - I have also tried plotting the histogram using plotly with the following implementation which returns the same results (very low frequency or occurrence)

fig = go.Figure(data=[go.Histogram(x=year1_dist, nbinsx=nbins, name='Histogram')])

Ultimately, I would like to have a CDF overlaid on the histogram to resemble something like this though I know there will be some normalization involved on the histogram frequency and I need to reset my inputs a bit to have a mean at zero.

I have the CDF plotted and generated as expected and I am implementing the tool in streamlit. Here is the plotting section of my code which shows the CDF and Histogram (albeit with the histrogram values being very very low)

    bin_size = mean * 0.01
    nbins = math.ceil((year1_dist.max() - year1_dist.min()) / bin_size)
    hist, bin_edges = np.histogram(year1_dist, bins=nbins, density=True)
    cdf = np.cumsum(hist * np.diff(bin_edges))
    fig = go.Figure(data=[
        go.Histogram(x=year1_dist, nbinsx=nbins, name='Histogram'),
        go.Scatter(x=bin_edges, y=cdf, name='CDF')
    ])
    st.plotly_chart(fig, use_container_width=True)

np.histogram producing very small counts in specified bins

Answers (1)

Related Questions