Reputation: 819

Is there a way I can find the range of local maxima of histogram?

I'm wondering if there's a way I can find the range of local maxima of a histogram. For instance, suppose I have the following histogram (just ignore the orange curve): The histogram is actually obtained from a dictionary. I'm hoping to find the range of local maxima of this histogram (on the horizontal axis), which are, say, 1.3-1.6, and 2.1-2.4 in this case. I have no idea which tools would be helpful or which techniques I may want to use. I know there's a tool to find local maxima of a 1-D array:

from scipy.signal import argrelextrema
x = np.random.random(12)
argrelextrema(x, np.greater)

but I don't think it would work here since I'm looking for a range, and there're some 'wiggles' on the histogram. Can anyone give me some suggestions/examples about how I can obtain the range I'm looking for? Thanks a lot for the help

PS: I trying to not just search for the ranges of x whose y values are above a certain limit:)

Upvotes: 2

Answers (1)

Max Pierini

Reputation: 2269

I don't know if I correctly understand what you want to do, but you can treat the histogram as a Probability Density Function (PDF) of a bimodal distribution, then find the modes and the Highest Density Intervals (HDIs) around the two modes.

So, I create some sample data

import numpy as np
import pandas as pd
import scipy.stats as sps
from scipy.signal import find_peaks, argrelextrema
import matplotlib.pyplot as plt

d1 = sps.norm(loc=1.3, scale=.2)
d2 = sps.norm(loc=2.2, scale=.3)

r1 = d1.rvs(size=5000, random_state=1)
r2 = d2.rvs(size=5000, random_state=1)

r = np.concatenate((r1, r2))

h = plt.hist(r, bins=100, density=True);

We have only h, the result of the hist function that will contains the density (100) and the ranges of the bins (101).

print(h[0].size)
100
print(h[1].size)
101

So we first need to choose the mean of each bin

density = h[0]
values = h[1][:-1] + np.diff(h[1])[0] / 2

plt.hist(r, bins=100, density=True, alpha=.25)
plt.plot(values, density);

Now we can normalize the PDF (to sum to 1) and smooth the data with moving average that we'll use only to get the peaks (maxima) and minima

norm_density = density / density.sum()
norm_density_ma = pd.Series(norm_density).rolling(7, center=True).mean().values

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density);

Now we can obtain indexes of maxima

peaks = find_peaks(norm_density_ma)[0]
peaks
array([24, 57])

and minima

minima = argrelextrema(norm_density_ma, np.less)[0]
minima
array([40])

and check they're correct

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
for peak in peaks:
    plt.axvline(values[peak], color='r')
plt.axvline(values[minima], color='k', ls='--');

Finally, we have to find out the HDIs around the two modes (peaks) from the normalized h histogram data. We can use a simple function to get the HDI of grid (see HDI_of_grid for details and Doing Bayesian Data Analysis by John K. Kruschke)

def HDI_of_grid(probMassVec, credMass=0.95):
    sortedProbMass = np.sort(probMassVec, axis=None)[::-1]
    HDIheightIdx = np.min(np.where(np.cumsum(sortedProbMass) >= credMass))
    HDIheight = sortedProbMass[HDIheightIdx]
    HDImass = np.sum(probMassVec[probMassVec >= HDIheight])
    idx = np.where(probMassVec >= HDIheight)[0]
    return {'indexes':idx, 'mass':HDImass, 'height':HDIheight}

Let's say we want the HDIs to contain a mass of 0.3

# HDI around the 1st mode
hdi1 = HDI_of_grid(norm_density, credMass=.3)

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
    values[hdi1['indexes']], 
    0, norm_density[hdi1['indexes']],
    alpha=.25
)
for peak in peaks:
    plt.axvline(values[peak], color='r')

for the 2nd mode, we'll get HDI from minima to avoid the 1st mode

# HDI around the 2nd mode
hdi2 = HDI_of_grid(norm_density[minima[0]:], credMass=.3)

plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
    values[hdi1['indexes']], 
    0, norm_density[hdi1['indexes']],
    alpha=.25
)
plt.fill_between(
    values[hdi2['indexes']+minima], 
    0, norm_density[hdi2['indexes']+minima],
    alpha=.25
)
for peak in peaks:
    plt.axvline(values[peak], color='r')

And we have the values of the two HDIs

# 1st mode
values[peaks[0]]
1.320249129265321
# 0.3 HDI
values[hdi1['indexes']].take([0, -1])
array([1.12857599, 1.45715851])

# 2nd mode
values[peaks[1]]
2.2238510564735363
# 0.3 HDI
values[hdi2['indexes']+minima].take([0, -1])
array([1.95003229, 2.47028795])

Upvotes: 6

Is there a way I can find the range of local maxima of histogram?

Answers (1)

Related Questions