H.H
H.H

Reputation: 631

How can I compute the area under a histogram after a certain value?

I am using this to compute the area under the whole histogram. However I cannot find resources that tell how to calculate the area under the histogram after a value or within a particular interval. Any idea regarding this please? x is my data here and values are the probabilities of occurrence.

area = sum(np.diff(bins)*values)

Upvotes: 1

Views: 2047

Answers (3)

Avi Vajpeyi
Avi Vajpeyi

Reputation: 698

Using A.Rauan's approach, here is a visualisation of a histogram and the area after a certain value:

Area of hist after various values

import numpy as np
import matplotlib.pyplot as plt


def find_bin_idx_of_value(bins, value):
    """Finds the bin which the value corresponds to."""
    array = np.asarray(value)
    idx = np.digitize(array,bins)
    return idx-1

def area_after_val(counts, bins, val):
    """Calculates the area of the hist after a certain value"""
    left_bin_edge_index = find_bin_idx_of_value(bins, val)
    bin_width = np.diff(bins)[0]
    area = sum(bin_width * counts[left_bin_edge_index:])
    return area

def add_area_line_to_plot(axes, counts, bins, val):
    """Adds a vertical line and labels it with the value and area after that line"""
    area = area_after_val(counts, bins, val)
    axes.axvline(val, color='r', label=f"val={val:.2f}, Area={area:.2f}")


def main():
    num_data_points, loc, scale = 1000, 40, 20
    data = np.random.normal(loc, scale,num_data_points)
    fig, ax = plt.subplots()
    counts, bins, _ = ax.hist(data, bins=20, alpha=0.3, density=True, label="Data")
    add_area_line_to_plot(ax, counts, bins, val=min(data))
    add_area_line_to_plot(ax, counts, bins, val=np.mean(data))
    add_area_line_to_plot(ax, counts, bins, val=np.mean(data)*2)
    add_area_line_to_plot(ax, counts, bins, val=np.mean(data)*3)
    ax.legend()
    plt.show()

if __name__ == "__main__":
    main()  

Upvotes: 1

A.Rauan
A.Rauan

Reputation: 42

area = sum(np.diff(bins)[0]*values[start:])

np.diff(bins) helps you find portions of x, which is same along x-axis. Therefor, you can take first element.

Upvotes: 0

maxymoo
maxymoo

Reputation: 36545

I believe np.diff(bins) is a 1-d numpy array in which case you can slice it as np.diff(bins)[start:end], and np.diff(bins)[start:] for all values after something.

Upvotes: 0

Related Questions