matousc
matousc

Reputation: 3977

Effective way to find the maximum points in an array with sufficent distance from each other with Numpy

I have an array of number (like time series) and I want to certain number of high values. The high values (local maxima) should be far enough from each other. My solution:

  1. find maxima in the whole arrray
  2. clear surrounding around maxima (override the values with any small value)
  3. repeat

The code:

import numpy as np
import matplotlib.pylab as plt


MIN_DIST = 4
TO_FIND = 2

np.random.seed(103)

x = np.random.normal(0, 1, 20)
x[2] = 4
x[3] = 4.1
x[13] = 3.9



plt.plot(x)
plt.show()


locs = []
for idx in range(TO_FIND):
    loc = x.argmax()
    x[max(0,loc-MIN_DIST):min(loc+MIN_DIST,len(x))] = -2
    locs.append(loc)


print(locs)

enter image description here

Printed correct asnswer = 3, 13

In the example above there are two "too close values" - index 2 and 3 - so they should be counted only once as a maximum at index 3 (the bigger value). The second maxima I want to find is at index 13.

The code provided works well. However, I feel like it is really dumb way to do it. Is there any numpy or mathematical trick (even dirty tricky counts) on how to achieve it in a cheaper way?

Amateur comparison to scipy.signal find_peaks:

import numpy as np
import matplotlib.pylab as plt
import time
from scipy.signal import find_peaks

N = 10000
MIN_DIST = 4
TO_FIND = 2

t1 = 0
t2 = 0

correct = []
for k in range(N):
    y = np.random.normal(0, 1, 10000)
    y[3] = 5
    y[4] = 5.1
    y[11] = 4.9

#     plt.plot(y)
#     plt.show()

    t0 = time.time()
    peaks, _ = find_peaks(y, distance=MIN_DIST)
    t1 += time.time() - t0

    t0 = time.time()
    x = y.copy()
    locs = []
    for idx in range(TO_FIND):
        loc = x.argmax()
        x[max(0,loc-MIN_DIST):min(loc+MIN_DIST,len(x))] = -2
        locs.append(loc)
    t2 += time.time() - t0

    same_answers = all([a == b for a, b in zip(locs, peaks[:TO_FIND])])

    correct.append(same_answers)

print("Correct (same answers):", all(correct))
print("find_peaks:", t1)
print("default:", t2)

find_peaks seems to be a bit slower:

Correct (same answers): True

find_peaks: 3.137294292449951

default: 0.8532450199127197

Also if I remove the "fake samples" and the maxima there are not so clear, the results are not the same.

Upvotes: 1

Views: 149

Answers (1)

Marco Cerliani
Marco Cerliani

Reputation: 22031

I suggest you this solution from scipy:

from scipy.signal import find_peaks

np.random.seed(103)

x = np.random.normal(0, 1, 20)
x[2] = 4
x[3] = 4.1
x[13] = 3.9

MIN_DIST = 4
peaks, _ = find_peaks(x, distance=MIN_DIST, height=3)
peaks

Upvotes: 1

Related Questions