Ryanc88
Ryanc88

Reputation: 192

Replace all elements of Numpy array greater than threshold with average of X adjacent values

I have this Numpy array that contains a data set

array = np.array([3147, 3228, 3351, 3789, 4562, 4987, 5688, 6465, 7012, 7560, 7976, 8615, 8698, 8853, 8783, 8949, 9066, 9123, 9172, 9411, 9717, 9696, 9848,10113, 10154, 10227, 10439, 10672, 10287, 10386, 10417, 10585, 10607,10461, 10654, 10739, 10634, 10490, 10544, 10645, 10392, 10330, 10044, 9560, 8711, 8152, 7506, 7191, 6994, 6601, 6609, 6670, 7293, 32767, 7264, 7262, 7503 ,7872, 7826, 8037])

When plotted, it gives a smooth distribution but spikes with the outlier value of 32767. Currently I have this which sets any pixel greater than a threshold value of 16384 to zero.

array[array > 16384] = 0

How can I change this so that the replacement value is the averaged of the X left and right values if the pixel is above the threshold value? If the outlier point is at the very first index or the very last index then the averaged value should just be from the side with values. There could also be multiple values greater than the threshold value (in this example there was only one)

The expected output with the example input that uses 2 adjacent right and left values would be calculated like (6670 + 7293 + 7264 + 7262)/4 = 7122.25 to get this result

array = np.array([3147, 3228, 3351, 3789, 4562, 4987, 5688, 6465, 7012, 7560, 7976, 8615, 8698, 8853, 8783, 8949, 9066, 9123, 9172, 9411, 9717, 9696, 9848,10113, 10154, 10227, 10439, 10672, 10287, 10386, 10417,10585, 10607,10461, 10654, 10739, 10634, 10490, 10544, 10645, 10392, 10330, 10044, 9560, 8711, 8152, 7506, 7191, 6994, 6601, 6609, 6670, 7293, 7122, 7264, 7262, 7503 ,7872, 7826, 8037])

Thanks!

Upvotes: 1

Views: 684

Answers (2)

nathancy
nathancy

Reputation: 46630

This would work

def remove_outlier_pixels(array, adjacent=2):
    outliers = np.argwhere(array > 16384)
    for outlier in outliers:
        outlier = int(outlier)
        left = array[outlier-adjacent:outlier]
        right = array[outlier+1:outlier+adjacent+1]
        array[outlier] = (left.sum() + right.sum())/(left.size + right.size)
    return array

Averages out all pixels greater than the threshold with X right and left adjacent values. Also takes care of the corner case if the higher threshold value was at the first or last index

Using this input

[99999 3228 3351 3789 4562 4987 5688 6465 7012 7560 7976 8615 8698 8853 8783 8949 9066 37000 9172 9411 9717 9696 9848 10113 10154 10227 10439 10672 10287 10386 10417 10585 10607 10461 10654 10739 10634 10490 10544 10645 10392 10330 10044 9560 8711 8152 7506 7191 6994 6601 6609 6670 7293 32767 7264 7262 7503 7872 7826 88888]

We get

[ 3289 3228 3351 3789 4562 4987 5688 6465 7012 7560 7976 8615 8698 8853 8783 8949 9066 9149 9172 9411 9717 9696 9848 10113 10154 10227 10439 10672 10287 10386 10417 10585 10607 10461 10654 10739 10634 10490 10544 10645 10392 10330 10044 9560 8711 8152 7506 7191 6994 6601 6609 6670 7293 7122 7264 7262 7503 7872 7826 7849]

Upvotes: 2

Anna Nevison
Anna Nevison

Reputation: 2759

You can do:

X = 2 #set number of adjacent values
calc_avg = lambda x: (sum([array[x+a]+array[x-a] for a in range(1, X+1)]))/4
array[array > 16384] = [calc_avg(x[0]) for x in np.where(array > 16384)]

This may run into issues though if you're cut off value does not have 2 numbers before/after it!

Upvotes: 1

Related Questions