user2667066
user2667066

Reputation: 2099

Create a set of slices from a numpy array, for parallelization

I have a numpy array like ids = ([0,0,0,1,1,2,2,2,2,4,5,5,5]) and some other numpy arrays (say a and b) of the same length. I want to carry out some independent operations using slices of these arrays, with the slices defined as indexes that share the same (contiguous) set of ids. I.e. I want to define a set of slices like

slice_0 = 0:3
slice_1 = 3:5
slice_2 = 5:9
...

so that I can call a function f(a[slice_n],b[slice_n]) for each n in parallel. How do I construct the slices in numpy? If it helps, in R I would do it with something like tapply.

Upvotes: 0

Views: 389

Answers (4)

Paul Panzer
Paul Panzer

Reputation: 53069

If you want to chop up an array into chunks along an axis the simplest way is np.split:

>>> a = np.arange(10)
>>> split_points = (2,5,7)
>>> np.split(a, split_points)
[array([0, 1]), array([2, 3, 4]), array([5, 6]), array([7, 8, 9])]

If you want even splitting you can use np.arange for split_points.

To create split points from an id array use split_points = np.where(np.diff(ids))[0] + 1

If your id array is sorted and you also have the ids without repeats then split_points = np.searchsorted(ids, ids_wor)[1:] might be faster.

Upvotes: 0

Daniel F
Daniel F

Reputation: 14399

to get your split points:

spl=np.r_[0, np.where(np.nonzero(np.diff(ids)))[0] + 1, ids.size]

then a list of slices

slices=[slice(i,j) for i,j in zip(spl[:-1].flat, spl[1:].flat)]

or split your other arrays

a_spl=np.split(a,spl[1:-1])

EDIT: since idx is sorted and in order, you can either do unique above or do a boolean slicing (if you have the memory)

slices = list(np.unique(ids)[:,None] == ids[None,:])

Upvotes: 1

B. M.
B. M.

Reputation: 18658

A way to do that :

In [12]: arrays=vstack((a,b))  

In [13]: arrays
Out[13]: 
array([[4, 1, 4, 2, 5, 7, 1, 5, 9],
       [8, 1, 1, 1, 9, 3, 0, 3, 1]])

In [14]: subarrays=np.split(arrays,[3,5],axis=1)

In [15]: subarrays
Out[15]: 
[array([[4, 1, 4],
        [8, 1, 1]]), 
 array([[2, 5],
        [1, 9]]), 
 array([[7, 1, 5, 9],
        [3, 0, 3, 1]])]

In [16]: [multiply(a,b) for (a,b) in subarrays]
Out[16]: [array([32,  1,  4]), array([ 2, 45]), array([21,  0, 15,  9])]

Upvotes: 0

zio_tom
zio_tom

Reputation: 21

I'm not sure I understand your question, perhaps you intended

slice_0 = 0:3
slice_1 = 3:5
slice_2 = 5:9
slice_3 = 9:10
slice_4 = 10:13

If this is the case, you can use NumPy's unique:

_, idx, count = numpy.unique(ids, return_index=True, return_counts=True)

The lower limit of the slices is idx, the upper limit is idx + count.

Upvotes: 1

Related Questions