Reputation: 2099
I have a numpy array like ids = ([0,0,0,1,1,2,2,2,2,4,5,5,5])
and some other numpy arrays (say a
and b
) of the same length. I want to carry out some independent operations using slices of these arrays, with the slices defined as indexes that share the same (contiguous) set of ids. I.e. I want to define a set of slices like
slice_0 = 0:3
slice_1 = 3:5
slice_2 = 5:9
...
so that I can call a function f(a[slice_n],b[slice_n])
for each n in parallel. How do I construct the slices in numpy? If it helps, in R I would do it with something like tapply
.
Upvotes: 0
Views: 389
Reputation: 53069
If you want to chop up an array into chunks along an axis the simplest way is np.split
:
>>> a = np.arange(10)
>>> split_points = (2,5,7)
>>> np.split(a, split_points)
[array([0, 1]), array([2, 3, 4]), array([5, 6]), array([7, 8, 9])]
If you want even splitting you can use np.arange
for split_points
.
To create split points from an id array use split_points = np.where(np.diff(ids))[0] + 1
If your id array is sorted and you also have the ids without repeats then split_points = np.searchsorted(ids, ids_wor)[1:]
might be faster.
Upvotes: 0
Reputation: 14399
to get your split points:
spl=np.r_[0, np.where(np.nonzero(np.diff(ids)))[0] + 1, ids.size]
then a list of slices
slices=[slice(i,j) for i,j in zip(spl[:-1].flat, spl[1:].flat)]
or split your other arrays
a_spl=np.split(a,spl[1:-1])
EDIT: since idx
is sorted and in order, you can either do unique
above or do a boolean slicing (if you have the memory)
slices = list(np.unique(ids)[:,None] == ids[None,:])
Upvotes: 1
Reputation: 18658
A way to do that :
In [12]: arrays=vstack((a,b))
In [13]: arrays
Out[13]:
array([[4, 1, 4, 2, 5, 7, 1, 5, 9],
[8, 1, 1, 1, 9, 3, 0, 3, 1]])
In [14]: subarrays=np.split(arrays,[3,5],axis=1)
In [15]: subarrays
Out[15]:
[array([[4, 1, 4],
[8, 1, 1]]),
array([[2, 5],
[1, 9]]),
array([[7, 1, 5, 9],
[3, 0, 3, 1]])]
In [16]: [multiply(a,b) for (a,b) in subarrays]
Out[16]: [array([32, 1, 4]), array([ 2, 45]), array([21, 0, 15, 9])]
Upvotes: 0
Reputation: 21
I'm not sure I understand your question, perhaps you intended
slice_0 = 0:3
slice_1 = 3:5
slice_2 = 5:9
slice_3 = 9:10
slice_4 = 10:13
If this is the case, you can use NumPy's unique:
_, idx, count = numpy.unique(ids, return_index=True, return_counts=True)
The lower limit of the slices is idx
, the upper limit is idx + count
.
Upvotes: 1