pythonarraysperformancenumpyvectorization

Reputation: 63

Combine or join numpy arrays

How can I join two numpy ndarrays to accomplish the following in a fast way, using optimized numpy, without any looping?

>>> a = np.random.rand(2,2)
>>> a
array([[ 0.09028802,  0.2274419 ],
       [ 0.35402772,  0.87834376]])

>>> b = np.random.rand(2,2)
>>> b
array([[ 0.4776325 ,  0.73690098],
       [ 0.69181444,  0.672248  ]])

>>> c = ???
>>> c
array([[ 0.09028802,  0.2274419, 0.4776325 ,  0.73690098],
       [ 0.09028802,  0.2274419, 0.69181444,  0.672248  ],
       [ 0.35402772,  0.87834376, 0.4776325 ,  0.73690098],
       [ 0.35402772,  0.87834376, 0.69181444,  0.672248  ]])

Upvotes: 5

Answers (6)

Estrob

Reputation: 1

Try either np.hstack or np.vstack. This would work even for arrays that are not the same length. All you would need to do is this: np.hstack(appendedarray[:]) or np.vstack(appendedarray[:])

Upvotes: 0

Divakar

Reputation: 221534

Let's walk through a prospective solution to handle generic cases involving different shaped arrays with some inlined comments to explain the method involved.

(1) First off, we store shapes of input arrays.

ma,na = a.shape
mb,nb = b.shape

(2) Next up, initialize a 3D array with number of columns being the sum of number of columns in input arraysa and b. Use np.empty for this task.

out = np.empty((ma,mb,na+nb),dtype=a.dtype)

(3) Then, set the first axis of the 3D array for the first "na" columns with the rows from a with a[:,None,:]. So, if we assign it to out[:,:,:na], that second colon would indicate to NumPy that we need a broadcasted setting, if possible as always happens with singleton dims in NumPy arrays. In effect, this would be same as tiling/repeating, but possibly in an efficient way.

out[:,:,:na] = a[:,None,:]

(4) Repeat for setting elements from b into output array. This time we would broadcast along the first axis of out with out[:,:,na:], with that first colon helping us do that broadcasting.

out[:,:,na:] = b

(5) Final step is to reshape the output to a 2D shape. This could be done with simply changing the shape with the required 2D shape tuple. Reshaping just changes view and is effectively zero cost.

out.shape = (ma*mb,na+nb)

Condensing everything, the full implementation would look like this -

ma,na = a.shape
mb,nb = b.shape
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
out[:,:,:na] = a[:,None,:]
out[:,:,na:] = b
out.shape = (ma*mb,na+nb)

Upvotes: 2

HYRY

Reputation: 97281

You can use dstack() and broadcast_arrays():

import numpy as np

a = np.random.randint(0, 10, (3, 2))
b = np.random.randint(10, 20, (4, 2))

np.dstack(np.broadcast_arrays(a[:, None], b)).reshape(-1, a.shape[-1] + b.shape[-1])

Upvotes: 0

Paul

Reputation: 10863

What you want is, apparently, the cartesian product of a and b, stacked horizontally. You can use the itertools module to generate the indices for the numpy arrays, then numpy.hstack to stack them:

import numpy as np
from itertools import product

a = np.array([[ 0.09028802,  0.2274419 ],
              [ 0.35402772,  0.87834376]])

b = np.array([[ 0.4776325 ,  0.73690098],
              [ 0.69181444,  0.672248  ],
              [ 0.79941110,  0.52273   ]])

a_inds, b_inds = map(list, zip(*product(range(len(a)), range(len(b)))))

c = np.hstack((a[a_inds], b[b_inds]))

This results in a c of:

array([[ 0.09028802,  0.2274419 ,  0.4776325 ,  0.73690098],
       [ 0.09028802,  0.2274419 ,  0.69181444,  0.672248  ],
       [ 0.09028802,  0.2274419 ,  0.7994111 ,  0.52273   ],
       [ 0.35402772,  0.87834376,  0.4776325 ,  0.73690098],
       [ 0.35402772,  0.87834376,  0.69181444,  0.672248  ],
       [ 0.35402772,  0.87834376,  0.7994111 ,  0.52273   ]])

Breaking down the indices thing:

product(range(len(a)), range(len(b)) will generate something that looks like this if you convert it to a list:

[(0, 0), (0, 1), (1, 0), (1, 1)]

You want something like this: [0, 0, 1, 1], [0, 1, 0, 1], so you need to transpose the generator. The idiomatic way to do this is with zip(*zipped_thing). However, if you just directly assign these, you'll get tuples, like this:

[(0, 0, 1, 1), (0, 1, 0, 1)]

But numpy arrays interpret tuples as multi-dimensional indexes, so you want to turn them to lists, which is why I mapped the list constructor onto the result of the product function.

Upvotes: 3

DSM

Reputation: 353059

Not the prettiest, but you could combine hstack, repeat, and tile:

>>> a = np.arange(4).reshape(2,2)
>>> b = a+10
>>> a
array([[0, 1],
       [2, 3]])
>>> b
array([[10, 11],
       [12, 13]])
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0,  1, 10, 11],
       [ 0,  1, 12, 13],
       [ 2,  3, 10, 11],
       [ 2,  3, 12, 13]])

Or for a 3x3 case:

>>> a = np.arange(9).reshape(3,3)
>>> b = a+10
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0,  1,  2, 10, 11, 12],
       [ 0,  1,  2, 13, 14, 15],
       [ 0,  1,  2, 16, 17, 18],
       [ 3,  4,  5, 10, 11, 12],
       [ 3,  4,  5, 13, 14, 15],
       [ 3,  4,  5, 16, 17, 18],
       [ 6,  7,  8, 10, 11, 12],
       [ 6,  7,  8, 13, 14, 15],
       [ 6,  7,  8, 16, 17, 18]])

Upvotes: 3

zdeeb

Reputation: 142

All arrays are indexable, so you can merge the by just calling:

a[:2],b[:2]

or you can use core numpy stacking functions, should look something like this:

c = np.vstack(a,b)

Upvotes: -1

Combine or join numpy arrays

Answers (6)

Related Questions