Reputation: 9
Trying to generate a simple histogram using 1% bins and a simple normal distribution but I am getting incredibly small bin counts - where am I messing up the implementation of np.histogram?
Here is the basic implementation:
import streamlit as st
import math
import pandas as pd
import numpy as np
from numpy.random import normal
import random
import matplotlib.pyplot as plt
import plotly.graph_objects as go
mean = 600000
uncertainty = 5.02
st_dev = mean * uncertainty/100
year1_dist = normal(mean, st_dev, 10000)
bin_size = mean * 0.01
nbins = math.ceil((year1_dist.max() - year1_dist.min()) / bin_size)
hist, bin_edges = np.histogram(year1_dist, bins=nbins, density=True)
The values stored in hist are very small (sum to something like 0.00017) - I have also tried plotting the histogram using plotly with the following implementation which returns the same results (very low frequency or occurrence)
fig = go.Figure(data=[go.Histogram(x=year1_dist, nbinsx=nbins, name='Histogram')])
Ultimately, I would like to have a CDF overlaid on the histogram to resemble something like this though I know there will be some normalization involved on the histogram frequency and I need to reset my inputs a bit to have a mean at zero.
I have the CDF plotted and generated as expected and I am implementing the tool in streamlit. Here is the plotting section of my code which shows the CDF and Histogram (albeit with the histrogram values being very very low)
bin_size = mean * 0.01
nbins = math.ceil((year1_dist.max() - year1_dist.min()) / bin_size)
hist, bin_edges = np.histogram(year1_dist, bins=nbins, density=True)
cdf = np.cumsum(hist * np.diff(bin_edges))
fig = go.Figure(data=[
go.Histogram(x=year1_dist, nbinsx=nbins, name='Histogram'),
go.Scatter(x=bin_edges, y=cdf, name='CDF')
])
st.plotly_chart(fig, use_container_width=True)
Upvotes: -1
Views: 749
Reputation: 20629
From the docs:
density bool, optional
If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1.
I suspect you did np.sum(hist)
and got your 0.0017. np.sum(hist)*bin_size
should give you the correct value of 1
Upvotes: 2