gpane
gpane

Reputation: 41

2d array as index of a 3d array

I had a 2D array (C) with 8000x64 elements, an 1D array (s) with 8000x1 elements and another 1D array (d) with 1x64 elements. Every row of index i, where s[i] is True, shall be added by vector d. This works quite well:

C[s == True] += d

Now I have added one dimension to C, s, and d and the logic above shall be applied to every element of the additional dimension.

The following code does what I want, but it's very slow.

for i in range(I):
        C_this = C[:,:,i]
        s_this = s[:,i]
        d_this = d[:,i]

        C_this[s_this == True] += d_this
        C[:,:,i] = C_this

Is there a numpy way to do this without a for loop?

Upvotes: 0

Views: 291

Answers (3)

bburks832
bburks832

Reputation: 81

Here is my implementation of the method @hpaulj proposed. Note that I don't want to take the credit from him, so please mark his answer, not mine, as correct. Just wanted to share what I did.

import numpy as np
import numpy.random as npr

C = np.zeros((100, 8000, 64), dtype=int)
s = np.zeros((100, 8000), dtype=bool)
d = np.zeros((100, 64), dtype=int)

C[:,:,:] = npr.randint(50, size=C.shape)
s[:,:] = npr.randint(3, size=s.shape)
d[:,:] = npr.randint(10, size=d.shape)

I, J = np.nonzero(s)
C[I, J] += d[I]

I then profiled the program I made, and it runs on my machine in less than 450 milliseconds (the last two lines take less than 300 ms). Note that the calls to "randint" were just to set up the array values, so those lines wouldn't apply to your use case.

Upvotes: 0

hpaulj
hpaulj

Reputation: 231335

It's easier with the extra dimension at the beginning:

In [376]: C = np.zeros((4,2,3),int)                                                            
In [377]: s = np.array([[0,0],[0,1],[1,0],[1,1]],bool)                                         
In [378]: d = np.arange(1,13).reshape(4,3)                                                     
In [379]: C.shape, s.shape, d.shape                                                            
Out[379]: ((4, 2, 3), (4, 2), (4, 3))
In [380]: I,J = np.nonzero(s)                                                                  
In [381]: I,J                                                                                  
Out[381]: (array([1, 2, 3, 3]), array([1, 0, 0, 1]))

In [383]: C[I,J]=d[I]                                                                          
In [384]: C                                                                                    
Out[384]: 
array([[[ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 0,  0,  0],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [ 0,  0,  0]],

       [[10, 11, 12],
        [10, 11, 12]]])

Your way:

In [385]: C = np.zeros((4,2,3),int)                                                            
In [386]: for i in range(4): 
     ...:     C[i,:,:][s[i,:]] += d[i,:] 
     ...:                                                                                      
In [387]: C                                                                                    
Out[387]: 
array([[[ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 0,  0,  0],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [ 0,  0,  0]],

       [[10, 11, 12],
        [10, 11, 12]]])

Upvotes: 2

Mad Physicist
Mad Physicist

Reputation: 114230

Due to how numpy indexing works, s selects the relevant rows of C in the first example. To do the same thing in the 3D case, you would have to reshape C into something that is (8000*3, 64) and s into (8000*3, 1). The only problem now is getting d to account for the different number of rows in each third dimension, which can be done with np.repeat.

The first part is

C2 = np.swapaxes(C, -1, 1).reshape(-1, 64)

This is extremely inefficient because it copies your entire array. A better arrangement would be if C had shape (3, 8000, 64) to begin with. Then you would only need to ravel the first two axes to get the proper shape and memory layout, without copying data.

repeats = np.count_nonzero(s, axis=0)
C.reshape(-1, 64)[s.ravel()] += np.repeat(d, repeats, axis=0)

Since the reshape operation returns a view in this case, the indexing should work properly to increment in-place. I don't think this approach is necessarily very good though, since it copies each row of d as many times as s is non-zero in the corresponding element of the new dimension.

Upvotes: 1

Related Questions