Reputation: 63
How can I join two numpy ndarrays to accomplish the following in a fast way, using optimized numpy, without any looping?
>>> a = np.random.rand(2,2)
>>> a
array([[ 0.09028802, 0.2274419 ],
[ 0.35402772, 0.87834376]])
>>> b = np.random.rand(2,2)
>>> b
array([[ 0.4776325 , 0.73690098],
[ 0.69181444, 0.672248 ]])
>>> c = ???
>>> c
array([[ 0.09028802, 0.2274419, 0.4776325 , 0.73690098],
[ 0.09028802, 0.2274419, 0.69181444, 0.672248 ],
[ 0.35402772, 0.87834376, 0.4776325 , 0.73690098],
[ 0.35402772, 0.87834376, 0.69181444, 0.672248 ]])
Upvotes: 5
Views: 3753
Reputation: 1
Try either np.hstack or np.vstack. This would work even for arrays that are not the same length. All you would need to do is this: np.hstack(appendedarray[:]) or np.vstack(appendedarray[:])
Upvotes: 0
Reputation: 221534
Let's walk through a prospective solution to handle generic cases involving different shaped arrays with some inlined comments to explain the method involved.
(1) First off, we store shapes of input arrays.
ma,na = a.shape
mb,nb = b.shape
(2) Next up, initialize a 3D array with number of columns being the sum of number of columns in input arraysa
and b
. Use np.empty
for this task.
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
(3) Then, set the first axis of the 3D array for the first "na" columns with the rows from a
with a[:,None,:]
. So, if we assign it to out[:,:,:na]
, that second colon would indicate to NumPy that we need a broadcasted setting, if possible as always happens with singleton dims in NumPy arrays. In effect, this would be same as tiling/repeating, but possibly in an efficient way.
out[:,:,:na] = a[:,None,:]
(4) Repeat for setting elements from b
into output array. This time we would broadcast along the first axis of out
with out[:,:,na:]
, with that first colon helping us do that broadcasting.
out[:,:,na:] = b
(5) Final step is to reshape the output to a 2D shape. This could be done with simply changing the shape with the required 2D shape tuple. Reshaping just changes view and is effectively zero cost.
out.shape = (ma*mb,na+nb)
Condensing everything, the full implementation would look like this -
ma,na = a.shape
mb,nb = b.shape
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
out[:,:,:na] = a[:,None,:]
out[:,:,na:] = b
out.shape = (ma*mb,na+nb)
Upvotes: 2
Reputation: 97281
You can use dstack()
and broadcast_arrays()
:
import numpy as np
a = np.random.randint(0, 10, (3, 2))
b = np.random.randint(10, 20, (4, 2))
np.dstack(np.broadcast_arrays(a[:, None], b)).reshape(-1, a.shape[-1] + b.shape[-1])
Upvotes: 0
Reputation: 10863
What you want is, apparently, the cartesian product of a
and b
, stacked horizontally. You can use the itertools
module to generate the indices for the numpy arrays, then numpy.hstack
to stack them:
import numpy as np
from itertools import product
a = np.array([[ 0.09028802, 0.2274419 ],
[ 0.35402772, 0.87834376]])
b = np.array([[ 0.4776325 , 0.73690098],
[ 0.69181444, 0.672248 ],
[ 0.79941110, 0.52273 ]])
a_inds, b_inds = map(list, zip(*product(range(len(a)), range(len(b)))))
c = np.hstack((a[a_inds], b[b_inds]))
This results in a c
of:
array([[ 0.09028802, 0.2274419 , 0.4776325 , 0.73690098],
[ 0.09028802, 0.2274419 , 0.69181444, 0.672248 ],
[ 0.09028802, 0.2274419 , 0.7994111 , 0.52273 ],
[ 0.35402772, 0.87834376, 0.4776325 , 0.73690098],
[ 0.35402772, 0.87834376, 0.69181444, 0.672248 ],
[ 0.35402772, 0.87834376, 0.7994111 , 0.52273 ]])
Breaking down the indices thing:
product(range(len(a)), range(len(b))
will generate something that looks like this if you convert it to a list:
[(0, 0), (0, 1), (1, 0), (1, 1)]
You want something like this: [0, 0, 1, 1]
, [0, 1, 0, 1]
, so you need to transpose the generator. The idiomatic way to do this is with zip(*zipped_thing)
. However, if you just directly assign these, you'll get tuples
, like this:
[(0, 0, 1, 1), (0, 1, 0, 1)]
But numpy arrays interpret tuples as multi-dimensional indexes, so you want to turn them to lists, which is why I mapped the list
constructor onto the result of the product
function.
Upvotes: 3
Reputation: 353059
Not the prettiest, but you could combine hstack
, repeat
, and tile
:
>>> a = np.arange(4).reshape(2,2)
>>> b = a+10
>>> a
array([[0, 1],
[2, 3]])
>>> b
array([[10, 11],
[12, 13]])
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0, 1, 10, 11],
[ 0, 1, 12, 13],
[ 2, 3, 10, 11],
[ 2, 3, 12, 13]])
Or for a 3x3 case:
>>> a = np.arange(9).reshape(3,3)
>>> b = a+10
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0, 1, 2, 10, 11, 12],
[ 0, 1, 2, 13, 14, 15],
[ 0, 1, 2, 16, 17, 18],
[ 3, 4, 5, 10, 11, 12],
[ 3, 4, 5, 13, 14, 15],
[ 3, 4, 5, 16, 17, 18],
[ 6, 7, 8, 10, 11, 12],
[ 6, 7, 8, 13, 14, 15],
[ 6, 7, 8, 16, 17, 18]])
Upvotes: 3
Reputation: 142
All arrays are indexable, so you can merge the by just calling:
a[:2],b[:2]
or you can use core numpy stacking functions, should look something like this:
c = np.vstack(a,b)
Upvotes: -1