Reputation: 949
I have the following 3 NumPy arrays:
arr1 = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape(2, 3)
arr2 = np.array(['g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']).reshape(2, 5)
arr3 = np.array(['r', 's', 't', 'u']).reshape(2, 2)
I would like to join them column-wise, but have them maintain separation between items coming from each array, like so:
Output:
array([[['a', 'b', 'c'], ['g', 'h', 'i', 'j', 'k'], ['r', 's']],
[['d', 'e', 'f'], ['l', 'm', 'n', 'o', 'p'], ['t', 'u']]], dtype='<U1')
However, I cannot find a NumPy function, which would achieve that for me. The closest I got was just a plain np.concatenate(), but the output does not retain separation I want:
Input: np.concatenate([arr1, arr2, arr3], axis = 1)
Output:
array([['a', 'b', 'c', 'g', 'h', 'i', 'j', 'k', 'r', 's'],
['d', 'e', 'f', 'l', 'm', 'n', 'o', 'p', 't', 'u']], dtype='<U1')
Any suggestions on how I can achieve the desired effect?
UPDATE: Thank you for some great answers. As an added level of difficulty, I would also like the solution to account for a possible variable number of input arrays, which would still share the same number of rows. Therefore, sometimes there would be 3, other times e.g. 6 etc.
Upvotes: 4
Views: 407
Reputation: 231395
In [13]: arr1 = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape(2, 3)
...: arr2 = np.array(['g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']).reshape(2, 5)
...: arr3 = np.array(['r', 's', 't', 'u']).reshape(2, 2)
If I try to make an object dtype array from these arrays, I get an error:
In [22]: np.array([arr1, arr2, arr3])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-155b98609c5b> in <module>
----> 1 np.array([arr1, arr2, arr3])
ValueError: could not broadcast input array from shape (2,3) into shape (2)
If they differed in number of rows, this would work, but with a common row number the result is an error. In such as case, I usually recommend defining an object array of the right size, and filling that:
In [14]: arr = np.empty((2,3), object)
In [15]: arr
Out[15]:
array([[None, None, None],
[None, None, None]], dtype=object)
But if I try to assign the first column, I get the same error:
In [17]: arr[:,0] = arr1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-9894797aa09e> in <module>
----> 1 arr[:,0] = arr1
ValueError: could not broadcast input array from shape (2,3) into shape (2)
I can instead assign row by row:
In [18]: arr[0,0] = arr1[0]
In [19]: arr[1,0] = arr1[1]
In [20]: arr[0,1] = arr2[0]
...
In [21]: arr
Out[21]:
array([[array(['a', 'b', 'c'], dtype='<U1'),
array(['g', 'h', 'i', 'j', 'k'], dtype='<U1'), None],
[array(['d', 'e', 'f'], dtype='<U1'), None, None]], dtype=object)
Alternatively, we can assign nested lists to the columns, without the broadcast error. This is effectively what the accepted answer is doing:
In [23]: arr[:,0] = arr1.tolist()
In [24]: arr[:,1] = arr2.tolist()
In [25]: arr[:,2] = arr3.tolist()
In [26]: arr
Out[26]:
array([[list(['a', 'b', 'c']), list(['g', 'h', 'i', 'j', 'k']),
list(['r', 's'])],
[list(['d', 'e', 'f']), list(['l', 'm', 'n', 'o', 'p']),
list(['t', 'u'])]], dtype=object)
These difficulties in creating the desired array are a good indicator that this is not, NOT, a good numpy
array structure. If it's hard to make, it probably will also be hard to use, or at least slow. Iteration on an object dtype array is slower than iteration on a list. About its only advantage compared to a list is that it is easy to reshape.
====
np.array
does work if the inputs are lists instead of array:
In [33]: np.array([arr1.tolist(), arr2.tolist(), arr3.tolist()])
Out[33]:
array([[list(['a', 'b', 'c']), list(['d', 'e', 'f'])],
[list(['g', 'h', 'i', 'j', 'k']), list(['l', 'm', 'n', 'o', 'p'])],
[list(['r', 's']), list(['t', 'u'])]], dtype=object)
or convert to a list to give a 'cleaner' display:
In [34]: _.tolist()
Out[34]:
[[['a', 'b', 'c'], ['d', 'e', 'f']],
[['g', 'h', 'i', 'j', 'k'], ['l', 'm', 'n', 'o', 'p']],
[['r', 's'], ['t', 'u']]]
and a transpose of that array does give the desired (3,2) array:
In [35]: _33.T.tolist()
Out[35]:
[[['a', 'b', 'c'], ['g', 'h', 'i', 'j', 'k'], ['r', 's']],
[['d', 'e', 'f'], ['l', 'm', 'n', 'o', 'p'], ['t', 'u']]]
Upvotes: 0
Reputation: 71580
You could try:
print(np.array([[x, y, z] for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))
Or if you want the inner rows as arrays as well:
print(np.array([np.array([x, y, z]) for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))
Output:
[[['a', 'b', 'c'] ['g', 'h', 'i', 'j', 'k'] ['r', 's']]
[['d', 'e', 'f'] ['l', 'm', 'n', 'o', 'p'] ['t', 'u']]]
And the shape is (2, 3)
as expected.
Edit:
As you mentioned in the comment, try:
l = [arr1, arr2, arr3] # list of the arrays:
print(np.array([np.array([x, y, z]) for x, y, z in zip(*[i.tolist() for i in l])]))
Upvotes: 3
Reputation: 73
I think this should give you the desired output. It's a modification of the answer given by @U10-Forward-ReinstateMonica where the inner elements were python lists
print(np.array([[np.array(x), np.array(y), np.array(z)] for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))
Upvotes: 1
Reputation: 1942
This may be a long way to do it, but it works:
arr_all = []
for i in range(arr1.shape[0]):
row = []
row.append([arr[i,:] for arr in [arr1, arr2, arr3]])
arr_all.append(row)
arr_all = np.array(arr_all).reshape(2,3)
Upvotes: 1