ZR-
ZR-

Reputation: 819

Is there a way I can find the range of local maxima of histogram?

I'm wondering if there's a way I can find the range of local maxima of a histogram. For instance, suppose I have the following histogram (just ignore the orange curve):enter image description here The histogram is actually obtained from a dictionary. I'm hoping to find the range of local maxima of this histogram (on the horizontal axis), which are, say, 1.3-1.6, and 2.1-2.4 in this case. I have no idea which tools would be helpful or which techniques I may want to use. I know there's a tool to find local maxima of a 1-D array:

from scipy.signal import argrelextrema
x = np.random.random(12)
argrelextrema(x, np.greater)

but I don't think it would work here since I'm looking for a range, and there're some 'wiggles' on the histogram. Can anyone give me some suggestions/examples about how I can obtain the range I'm looking for? Thanks a lot for the help

PS: I trying to not just search for the ranges of x whose y values are above a certain limit:)

Upvotes: 2

Views: 2356

Answers (1)

Max Pierini
Max Pierini

Reputation: 2249

I don't know if I correctly understand what you want to do, but you can treat the histogram as a Probability Density Function (PDF) of a bimodal distribution, then find the modes and the Highest Density Intervals (HDIs) around the two modes.

So, I create some sample data

import numpy as np
import pandas as pd
import scipy.stats as sps
from scipy.signal import find_peaks, argrelextrema
import matplotlib.pyplot as plt

d1 = sps.norm(loc=1.3, scale=.2)
d2 = sps.norm(loc=2.2, scale=.3)

r1 = d1.rvs(size=5000, random_state=1)
r2 = d2.rvs(size=5000, random_state=1)

r = np.concatenate((r1, r2))

h = plt.hist(r, bins=100, density=True);

enter image description here

We have only h, the result of the hist function that will contains the density (100) and the ranges of the bins (101).

print(h[0].size)
100
print(h[1].size)
101

So we first need to choose the mean of each bin

density = h[0]
values = h[1][:-1] + np.diff(h[1])[0] / 2

plt.hist(r, bins=100, density=True, alpha=.25)
plt.plot(values, density);

enter image description here

Now we can normalize the PDF (to sum to 1) and smooth the data with moving average that we'll use only to get the peaks (maxima) and minima

norm_density = density / density.sum()
norm_density_ma = pd.Series(norm_density).rolling(7, center=True).mean().values

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density);

enter image description here

Now we can obtain indexes of maxima

peaks = find_peaks(norm_density_ma)[0]
peaks
array([24, 57])

and minima

minima = argrelextrema(norm_density_ma, np.less)[0]
minima
array([40])

and check they're correct

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
for peak in peaks:
    plt.axvline(values[peak], color='r')
plt.axvline(values[minima], color='k', ls='--');

enter image description here

Finally, we have to find out the HDIs around the two modes (peaks) from the normalized h histogram data. We can use a simple function to get the HDI of grid (see HDI_of_grid for details and Doing Bayesian Data Analysis by John K. Kruschke)

def HDI_of_grid(probMassVec, credMass=0.95):
    sortedProbMass = np.sort(probMassVec, axis=None)[::-1]
    HDIheightIdx = np.min(np.where(np.cumsum(sortedProbMass) >= credMass))
    HDIheight = sortedProbMass[HDIheightIdx]
    HDImass = np.sum(probMassVec[probMassVec >= HDIheight])
    idx = np.where(probMassVec >= HDIheight)[0]
    return {'indexes':idx, 'mass':HDImass, 'height':HDIheight}

Let's say we want the HDIs to contain a mass of 0.3

# HDI around the 1st mode
hdi1 = HDI_of_grid(norm_density, credMass=.3)

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
    values[hdi1['indexes']], 
    0, norm_density[hdi1['indexes']],
    alpha=.25
)
for peak in peaks:
    plt.axvline(values[peak], color='r')

enter image description here

for the 2nd mode, we'll get HDI from minima to avoid the 1st mode

# HDI around the 2nd mode
hdi2 = HDI_of_grid(norm_density[minima[0]:], credMass=.3)

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
    values[hdi1['indexes']], 
    0, norm_density[hdi1['indexes']],
    alpha=.25
)
plt.fill_between(
    values[hdi2['indexes']+minima], 
    0, norm_density[hdi2['indexes']+minima],
    alpha=.25
)
for peak in peaks:
    plt.axvline(values[peak], color='r')

enter image description here

And we have the values of the two HDIs

# 1st mode
values[peaks[0]]
1.320249129265321
# 0.3 HDI
values[hdi1['indexes']].take([0, -1])
array([1.12857599, 1.45715851])

# 2nd mode
values[peaks[1]]
2.2238510564735363
# 0.3 HDI
values[hdi2['indexes']+minima].take([0, -1])
array([1.95003229, 2.47028795])

Upvotes: 6

Related Questions