Reputation: 89
I have a NumPy array that consists of groups of sequential values and I would like to detect the median value(or closest integer) of each group. After that, I should create new arrays by subtracting and adding some values.
Example: data=[100,101,102,103,170,171,172,252,253,254,255,256,333,334,335]
Demand:
the median value of first(closest to median):103,
the median value of second:171,
the median value of third:254,
the median value of forth:334
I want to subtract and add same value of that numbers, let's say 20 than:
final_array =[(83,123), (151,191), (234,274), (314, 354)]
It should not be a median value but it should be a number in a sublist. How can I do it by using python?
Thanks in advance...
Upvotes: 0
Views: 615
Reputation: 80329
Here is full numpy approach, avoiding the unequal arrays created by np.split
. (It only makes a difference if the data is really huge.)
import numpy as np
data = [100, 101, 102, 103, 170, 171, 172, 252, 253, 254, 255, 256, 333, 334, 335]
data = np.array(data)
# make a list of indices where the step is different from 1
change_pos = np.argwhere(np.diff(data) != 1).squeeze()
# the start of each group is at change_pos+1, and also at position 0
starts = np.append(0, change_pos + 1)
# the end of each group is at change_pos and also at the very end
ends = np.append(change_pos, len(data) - 1)
# the medians are the rounded mean of the start and end values
medians = (data[starts] + data[ends] + 1) // 2
num = 20
# create two columns, subtracting and adding num
final_array = np.c_[medians - num, medians + num]
Upvotes: 0
Reputation: 872
You can do something like this:
First lets split the main array to sequential sub-arrays:
splitted_data = np.array(np.split(data, np.where(np.diff(data) != 1)[0]+1), dtype=object)
essentially we are searching the array where
the difference between two number is not 1, if the condition is met it splits
it.
The last 1
after the +
can be changed of course if you are looking for sequences with different difference.
Now since spillted_data
is an np.array
with different shaped objects, np.median
won't work "as-is", so lets np.vectorize
that method:
vectorized_med = np.vectorize(np.median)
Then just extract median with the vectorized function & round it to match closest int
requirement:
medians = np.round(vectorized_med(splitted_data))
Now you can construct your final array with a list comprehension:
num = 20
final_array = np.array([(i - num, i + num) for i in medians])
final output:
array([[ 82., 122.],
[151., 191.],
[234., 274.],
[314., 354.]])
*Just as a side note, the median of [100, 101, 102, 103]
is 101.5
.
Upvotes: 3
Reputation: 1725
As an alternative solution (avoiding np.vectorize
)
import numpy as np
data=np.array([100,101,102,103,170,171,172,252,253,254,255,256,333,334,335])
ddiff = np.diff(data)
#split data
subArrays = np.split(data, np.where(ddiff != 1)[0]+1)
c_val = 20
medians = []
extremes = []
for subArray in subArrays:
medians.append(np.round(np.median(subArray)).astype(int))
extremes.append((medians[-1] - c_val, medians[-1] + c_val))
print(extremes)
#outputs
# [(82, 122), (151, 191), (234, 274), (314, 354)]
Upvotes: 1