helloworld
helloworld

Reputation: 89

How can I detect sequential values in a Numpy array and process it?

I have a NumPy array that consists of groups of sequential values and I would like to detect the median value(or closest integer) of each group. After that, I should create new arrays by subtracting and adding some values.

Example: data=[100,101,102,103,170,171,172,252,253,254,255,256,333,334,335]

Demand:

the median value of first(closest to median):103,

the median value of second:171,

the median value of third:254,

the median value of forth:334

I want to subtract and add same value of that numbers, let's say 20 than:

final_array =[(83,123), (151,191), (234,274), (314, 354)]

It should not be a median value but it should be a number in a sublist. How can I do it by using python?

Thanks in advance...

Upvotes: 0

Views: 615

Answers (3)

JohanC
JohanC

Reputation: 80329

Here is full numpy approach, avoiding the unequal arrays created by np.split. (It only makes a difference if the data is really huge.)

import numpy as np

data = [100, 101, 102, 103, 170, 171, 172, 252, 253, 254, 255, 256, 333, 334, 335]
data = np.array(data)
# make a list of indices where the step is different from 1
change_pos = np.argwhere(np.diff(data) != 1).squeeze()
# the start of each group is at change_pos+1, and also at position 0
starts = np.append(0, change_pos + 1)
# the end of each group is at change_pos and also at the very end
ends = np.append(change_pos, len(data) - 1)
# the medians are the rounded mean of the start and end values
medians = (data[starts] + data[ends] + 1) // 2
num = 20
# create two columns, subtracting and adding num
final_array = np.c_[medians - num, medians + num]

Upvotes: 0

ImSo3K
ImSo3K

Reputation: 872

You can do something like this:

First lets split the main array to sequential sub-arrays:

splitted_data = np.array(np.split(data, np.where(np.diff(data) != 1)[0]+1), dtype=object)

essentially we are searching the array where the difference between two number is not 1, if the condition is met it splits it.

The last 1 after the + can be changed of course if you are looking for sequences with different difference.

Now since spillted_data is an np.array with different shaped objects, np.median won't work "as-is", so lets np.vectorize that method:

vectorized_med = np.vectorize(np.median)

Then just extract median with the vectorized function & round it to match closest int requirement:

medians = np.round(vectorized_med(splitted_data))

Now you can construct your final array with a list comprehension:

num = 20
final_array = np.array([(i - num, i + num) for i in medians])

final output:

array([[ 82., 122.],
       [151., 191.],
       [234., 274.],
       [314., 354.]])

*Just as a side note, the median of [100, 101, 102, 103] is 101.5.

Upvotes: 3

Ulises Bussi
Ulises Bussi

Reputation: 1725

As an alternative solution (avoiding np.vectorize)

import numpy as np

data=np.array([100,101,102,103,170,171,172,252,253,254,255,256,333,334,335])
ddiff = np.diff(data)

#split data
subArrays = np.split(data, np.where(ddiff != 1)[0]+1)

c_val = 20
medians = []
extremes = []
for subArray in subArrays:
    medians.append(np.round(np.median(subArray)).astype(int))
    extremes.append((medians[-1] - c_val, medians[-1] + c_val))

print(extremes)

#outputs
# [(82, 122), (151, 191), (234, 274), (314, 354)]

Upvotes: 1

Related Questions