jlcv
jlcv

Reputation: 1808

Vectorization - Adding numpy arrays without loops?

So I have the following numpy arrays:

c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])
X = array([[10, 15, 20,  5],
           [ 1,  2,  6, 23]])
y = array([1, 1])

I am trying to add each 1x4 row in the X array to one of the columns in c. The y array specifies which column. The above example, means that we are adding both rows in the X array to column 1 of c. That is, we should expect the result of:

     c = array([[ 1,  2+10+1,  3],  =  array([[ 1,  13,  3],
                [ 4,  5+15+2,  6],            [ 4,  22,  6],
                [ 7,  8+20+6,  9],            [ 7,  34,  9],
                [10, 11+5+23, 12]])           [10,  39, 12]])  

Does anyone know how I can do this without any loops? I tried c[:,y] += X but it seems like this only adds the second row of X to column 1 of c once. With that being said, it should be noted that y does not necessarily have to be [1,1], it can also be [0,1]. In this case, we would add the first row of X to column 0 of c and the second row of X to column 1 of c.

Upvotes: 2

Views: 1278

Answers (3)

Tonechas
Tonechas

Reputation: 13733

This is the solution I came up with:

def my_func(c, X, y):
    cc = np.zeros((len(y), c.shape[0], c.shape[1]))
    cc[range(len(y)), :, y] = X
    return c + np.sum(cc, 0)

The following interactive session demonstrates how it works:

>>> my_func(c, X, y)
array([[  1.,  13.,   3.],
       [  4.,  22.,   6.],
       [  7.,  34.,   9.],
       [ 10.,  39.,  12.]])
>>> y2 = np.array([0, 2])
>>> my_func(c, X, y2)
array([[ 11.,   2.,   4.],
       [ 19.,   5.,   8.],
       [ 27.,   8.,  15.],
       [ 15.,  11.,  35.]])

Upvotes: 0

hpaulj
hpaulj

Reputation: 231385

My first thought when I saw your desired calculation, was to just sum the 2 rows of X, and add that to the 2nd column of c:

In [636]: c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])

In [637]: c[:,1]+=X.sum(axis=0)

In [638]: c
Out[638]: 
array([[ 1, 13,  3],
       [ 4, 22,  6],
       [ 7, 34,  9],
       [10, 39, 12]])

But if we want to work from a general index like y, we need a special bufferless operation - that is if there are duplicates in y:

In [639]: c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])

In [641]: np.add.at(c,(slice(None),y),X.T)

In [642]: c
Out[642]: 
array([[ 1, 13,  3],
       [ 4, 22,  6],
       [ 7, 34,  9],
       [10, 39, 12]])

You need to look up .at in the numpy docs.

in Ipython add.at? shows me the doc that includes:

Performs unbuffered in place operation on operand 'a' for elements specified by 'indices'. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once. For example, a[[0,0]] += 1 will only increment the first element once because of buffering, whereas add.at(a, [0,0], 1) will increment the first element twice.

With a different y it still works

In [645]: np.add.at(c,(slice(None),[0,2]),X.T)

In [646]: c
Out[646]: 
array([[11,  2,  4],
       [19,  5,  8],
       [27,  8, 15],
       [15, 11, 35]])

Upvotes: 3

define cindy const
define cindy const

Reputation: 632

Firstly, your code seems to work in general if you transpose X. For example:

c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])
X = array([[10, 15, 20,  5],
           [ 1,  2,  6, 23]]).transpose()
y = array([1, 2])

c[:,y] += X
print c
#OUTPUT:
#[[ 1 12  4]
# [ 4 20  8]
# [ 7 28 15]
# [10 16 35]]

However, it doesn't work when there are any duplicate columns in y, like in your specific example. I believe this is because c[:, [1,1]] will generate an array with two columns, each having the slice c[:, 1]. Both of these slices point to the same part of c, and so when the addition happens on each, they are both read, then the corresponding part of X is added to each, then they are written back, meaning the last one to be written back is the final value. I don't believe numpy will let you vectorize an operation like this because it fundamentally can't be. This requires editing one column at a time, saving back it's value, and then editing it again later.

You might have to settle for no duplicates, or otherwise implement something like an accumulator.

Upvotes: 0

Related Questions