Reputation: 5011
So I have a 2 column numpy array of integers, say:
tarray = array([[ 368, 322],
[ 433, 420],
[ 451, 412],
[ 480, 440],
[ 517, 475],
[ 541, 503],
[ 578, 537],
[ 607, 567],
[ 637, 599],
[ 666, 628],
[ 696, 660],
[ 726, 687],
[ 756, 717],
[ 785, 747],
[ 815, 779],
[ 845, 807],
[ 874, 837],
[ 905, 867],
[ 934, 898],
[ 969, 928],
[ 994, 957],
[1027, 987],
[1057, 1017],
[1086, 1047],
[1117, 1079],
[1148, 1109],
[1177, 1137],
[1213, 1167],
[1237, 1197],
[1273, 1227],
[1299, 1261],
[1333, 1287],
[1357, 1317],
[1393, 1347],
[1416, 1377]])
I am using np.searchsorted to bisect lower and upper ranges of values into column 0 i.e can both times e.g 241,361 bisect into the array.
ranges = [array([241, 290, 350, 420, 540, 660, 780, 900]),
array([ 361, 410, 470, 540, 660, 780, 900, 1020])]
e.g: np.searchsorted(tarray[:,0], ranges)
This then results in:
array([[ 0, 0, 0, 1, 5, 9, 13, 17],
[ 0, 1, 3, 5, 9, 13, 17, 21]])
where each position in the two resulting arrays is the range of values. What I then want to do is get the position of minimum value in column 1 of the resulting slice. e.g here is what I mean simply in Python via iteration (if result of searchsorted is 2 column array 'f'):
f = array([[ 0, 0, 0, 1, 5, 9, 13, 17],
[ 0, 1, 3, 5, 9, 13, 17, 21]])
for i,(x,y) in enumerate(zip(*f)):
if y - x:
print ranges[1][i], tarray[x:y]
the result is:
410 [[368 322]]
470 [[368 322]
[433 420]
[451 412]]
540 [[433 420]
[451 412]
[480 440]
[517 475]]
660 [[541 503]
[578 537]
[607 567]
[637 599]]
780 [[666 628]
[696 660]
[726 687]
[756 717]]
900 [[785 747]
[815 779]
[845 807]
[874 837]]
1020 [[905 867]
[934 898]
[969 928]
[994 957]]
Now to explain what I want: within the sliced ranges I want the row that has the minimum value in column 1.
e.g 540 [[433 420]
[451 412]
[480 440]
[517 475]]
I want the final result to be 412 (as in [451 412])
e.g
for i,(x,y) in enumerate(zip(*f)):
if y - x:
print ranges[1][i], tarray[:,1:2][x:y].min()
410 322
470 322
540 412
660 503
780 628
900 747
1020 867
Basically I want to vectorise this so I can get back one array and not need to iterate as it is non performant for my needs. I want the minimum value in column 1 for a bisected range of values on column 0.
I hope I am being clear!
Upvotes: 3
Views: 649
Reputation: 10759
This appears to achieve your intended goals, using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
# to vectorize the concatenation of the slice ranges, we construct all indices implied in the slicing
counts = f[1] - f[0]
idx = np.ones(counts.sum(), dtype=np.int)
idx[np.cumsum(counts)[:-1]] -= counts[:-1]
tidx = np.cumsum(idx) - 1 + np.repeat(f[0], counts)
# combined with a unique label tagging the output of each slice range, this allows us to use grouping to find the minimum in each group
label = np.repeat(np.arange(len(f.T)), counts)
subtarray = tarray[tidx]
ridx, sidx = npi.group_by(label).argmin(subtarray[:, 0])
print(ranges[1][ridx])
print(subtarray[sidx, 1])
Upvotes: 1