Reputation: 949

Joining NumPy arrays column-wise with nested structure

I have the following 3 NumPy arrays:

arr1 = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape(2, 3)
arr2 = np.array(['g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']).reshape(2, 5)
arr3 = np.array(['r', 's', 't', 'u']).reshape(2, 2)

I would like to join them column-wise, but have them maintain separation between items coming from each array, like so:

Output:
array([[['a', 'b', 'c'], ['g', 'h', 'i', 'j', 'k'], ['r', 's']],
       [['d', 'e', 'f'], ['l', 'm', 'n', 'o', 'p'], ['t', 'u']]], dtype='<U1')

However, I cannot find a NumPy function, which would achieve that for me. The closest I got was just a plain np.concatenate(), but the output does not retain separation I want:

Input: np.concatenate([arr1, arr2, arr3], axis = 1)
Output:
array([['a', 'b', 'c', 'g', 'h', 'i', 'j', 'k', 'r', 's'],
       ['d', 'e', 'f', 'l', 'm', 'n', 'o', 'p', 't', 'u']], dtype='<U1')

Any suggestions on how I can achieve the desired effect?

UPDATE: Thank you for some great answers. As an added level of difficulty, I would also like the solution to account for a possible variable number of input arrays, which would still share the same number of rows. Therefore, sometimes there would be 3, other times e.g. 6 etc.

Upvotes: 4

Answers (4)

hpaulj

Reputation: 231395

In [13]: arr1 = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape(2, 3) 
    ...: arr2 = np.array(['g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']).reshape(2, 5) 
    ...: arr3 = np.array(['r', 's', 't', 'u']).reshape(2, 2)

If I try to make an object dtype array from these arrays, I get an error:

In [22]: np.array([arr1, arr2, arr3])                                                            
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-155b98609c5b> in <module>
----> 1 np.array([arr1, arr2, arr3])

ValueError: could not broadcast input array from shape (2,3) into shape (2)

If they differed in number of rows, this would work, but with a common row number the result is an error. In such as case, I usually recommend defining an object array of the right size, and filling that:

In [14]: arr = np.empty((2,3), object)                                                           
In [15]: arr                                                                                     
Out[15]: 
array([[None, None, None],
       [None, None, None]], dtype=object)

But if I try to assign the first column, I get the same error:

In [17]: arr[:,0] = arr1                                                                         
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-9894797aa09e> in <module>
----> 1 arr[:,0] = arr1

ValueError: could not broadcast input array from shape (2,3) into shape (2)

I can instead assign row by row:

In [18]: arr[0,0] = arr1[0]                                                                      
In [19]: arr[1,0] = arr1[1]                                                                      
In [20]: arr[0,1] = arr2[0] 
...                                                                     
In [21]: arr                                                                                     
Out[21]: 
array([[array(['a', 'b', 'c'], dtype='<U1'),
        array(['g', 'h', 'i', 'j', 'k'], dtype='<U1'), None],
       [array(['d', 'e', 'f'], dtype='<U1'), None, None]], dtype=object)

Alternatively, we can assign nested lists to the columns, without the broadcast error. This is effectively what the accepted answer is doing:

In [23]: arr[:,0] = arr1.tolist()                                                                
In [24]: arr[:,1] = arr2.tolist()                                                                
In [25]: arr[:,2] = arr3.tolist()                                                                
In [26]: arr                                                                                     
Out[26]: 
array([[list(['a', 'b', 'c']), list(['g', 'h', 'i', 'j', 'k']),
        list(['r', 's'])],
       [list(['d', 'e', 'f']), list(['l', 'm', 'n', 'o', 'p']),
        list(['t', 'u'])]], dtype=object)

These difficulties in creating the desired array are a good indicator that this is not, NOT, a good numpy array structure. If it's hard to make, it probably will also be hard to use, or at least slow. Iteration on an object dtype array is slower than iteration on a list. About its only advantage compared to a list is that it is easy to reshape.

====

np.array does work if the inputs are lists instead of array:

In [33]: np.array([arr1.tolist(), arr2.tolist(), arr3.tolist()])                                 
Out[33]: 
array([[list(['a', 'b', 'c']), list(['d', 'e', 'f'])],
       [list(['g', 'h', 'i', 'j', 'k']), list(['l', 'm', 'n', 'o', 'p'])],
       [list(['r', 's']), list(['t', 'u'])]], dtype=object)

or convert to a list to give a 'cleaner' display:

In [34]: _.tolist()                                                                              
Out[34]: 
[[['a', 'b', 'c'], ['d', 'e', 'f']],
 [['g', 'h', 'i', 'j', 'k'], ['l', 'm', 'n', 'o', 'p']],
 [['r', 's'], ['t', 'u']]]

and a transpose of that array does give the desired (3,2) array:

In [35]: _33.T.tolist()                                                                          
Out[35]: 
[[['a', 'b', 'c'], ['g', 'h', 'i', 'j', 'k'], ['r', 's']],
 [['d', 'e', 'f'], ['l', 'm', 'n', 'o', 'p'], ['t', 'u']]]

Upvotes: 0

U13-Forward

Reputation: 71580

You could try:

print(np.array([[x, y, z] for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))

Or if you want the inner rows as arrays as well:

print(np.array([np.array([x, y, z]) for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))

Output:

[[['a', 'b', 'c'] ['g', 'h', 'i', 'j', 'k'] ['r', 's']]
 [['d', 'e', 'f'] ['l', 'm', 'n', 'o', 'p'] ['t', 'u']]]

And the shape is (2, 3) as expected.

Edit:

As you mentioned in the comment, try:

l = [arr1, arr2, arr3] # list of the arrays:
print(np.array([np.array([x, y, z]) for x, y, z in zip(*[i.tolist() for i in l])]))

Upvotes: 3

codeblaze

Reputation: 73

I think this should give you the desired output. It's a modification of the answer given by @U10-Forward-ReinstateMonica where the inner elements were python lists

print(np.array([[np.array(x), np.array(y), np.array(z)] for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))

Upvotes: 1

ProteinGuy

Reputation: 1942

This may be a long way to do it, but it works:

arr_all = []
for i in range(arr1.shape[0]):
    row = []
    row.append([arr[i,:] for arr in [arr1, arr2, arr3]])
    arr_all.append(row)
arr_all = np.array(arr_all).reshape(2,3)

Upvotes: 1

Joining NumPy arrays column-wise with nested structure

Answers (4)

Related Questions