Reputation: 107
I want to be able to convert an existing 2D array to a 1D array of arrays. The only way I can find is to use something like:
my_2d_array = np.random.random((5, 3))
my_converted_array = np.zeros(len(my_2d_array), dtype='O')
for i, row in enumerate(my_converted_array):
my_converted_array[i] = row
Is there a faster/cleaner method of doing this?
If the inner arrays have different shapes it is possible, for example:
my_1d_array = np.array([
np.array([0, 1], dtype=np.float),
np.array([2], dtype=np.float)
], dtype='O')
assert my_array.shape == (2,)
But if the arrays are the same length numpy automatically makes it a 2D array:
my_2d_array = np.array([
np.array([0, 1], dtype=np.float),
np.array([2, 3], dtype=np.float)
], dtype='O')
assert my_array.shape == (2, 2)
EDIT: To clarify for some answers, I can't use flatten
, reshape
or ravel
as they would maintain the same number of elements. Instead I want to go from a a 2D array with shape (N, M)
to a 1D array with shape (N,)
of objects (1D arrays), which each have shape (M,)
.
Upvotes: 3
Views: 1241
Reputation: 53029
Here's one method using np.frompyfunc
that is a bit less typing than yours and comparable in speed - it seems roughly the same for small arrays but faster for large ones:
>>> import numpy as np
>>>
>>> def f_empty(a):
... n = len(a)
... b = np.empty((n,), dtype=object)
... for i in range(n):
... b[i] = a[i]
... return b
...
>>> def f_fpf(a):
... n = len(a)
... return np.frompyfunc(a.__getitem__, 1, 1)(np.arange(n))
...
>>> def f_fpfl(a):
... n = len(a)
... return np.frompyfunc(list(a).__getitem__, 1, 1)(np.arange(n))
...
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=10000)
>>> a = np.random.random((10, 20))
>>> repeat('f_fpf(a)', **kwds)
[0.04216550011187792, 0.039600114803761244, 0.03954345406964421]
>>> repeat('f_fpfl(a)', **kwds)
[0.05635825078934431, 0.04677496198564768, 0.04691878380253911]
>>> repeat('f_empty(a)', **kwds)
[0.04288528114557266, 0.04144620103761554, 0.041292963083833456]
>>> a = np.random.random((100, 200))
>>> repeat('f_fpf(a)', **kwds)
[0.20513887284323573, 0.2026138547807932, 0.20201953873038292]
>>> repeat('f_fpfl(a)', **kwds)
[0.21277308696880937, 0.18629810912534595, 0.18749701930209994]
>>> repeat('f_empty(a)', **kwds)
[0.2321561980061233, 0.24220682680606842, 0.22897077212110162]
>>> a = np.random.random((1000, 2000))
>>> repeat('f_fpf(a)', **kwds)
[2.1829855730757117, 2.1375885657034814, 2.1347726942040026]
>>> repeat('f_fpfl(a)', **kwds)
[1.8276268909685314, 1.8227900266647339, 1.8233762909658253]
>>> repeat('f_empty(a)', **kwds)
[2.5640305397100747, 2.565472401212901, 2.4353492129594088]
Upvotes: 2
Reputation: 231355
In [136]: arr = np.arange(15).reshape(5,3)
In [137]: arr1 = np.empty(5, object)
Direct assignment doesn't work:
In [138]: arr1[:] = arr
...
ValueError: could not broadcast input array from shape (5,3) into shape (5)
breaking the arr
into a list of rows does
In [139]: arr1[:] = list(arr)
In [140]: arr1
Out[140]:
array([array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]),
array([ 9, 10, 11]), array([12, 13, 14])], dtype=object)
I'm not too surprised that your original is competitive in speed:
In [141]: for i,row in enumerate(arr):
...: arr1[i] = row
arr1
contains pointers just like the list
In [143]: list(arr)
Out[143]:
[array([0, 1, 2]),
array([3, 4, 5]),
array([6, 7, 8]),
array([ 9, 10, 11]),
array([12, 13, 14])]
Operations on an object array nearly always require iteration and/or object referencing. Only things that run as fast as numeric array ones are those that don't do anything with the contents, like reshape and slice.
I found in other time tests that iteration on an object array is faster than iteration on the rows of an array, but still a bit slower than iteration on a list.
I have often made an array like this, but not in 'production' sizes. Posters often want to go the other direction, converting an object array to 2d, so I have used this replicate their example. Posters usually get an object array like this from something else, such as a Pandas dataframe, or some machine learning code that uses the object array for generality.
Upvotes: 1
Reputation: 415
There are methods like ravel
, flatten
and reshape
to do the job. Learn the difference between them here in this link.
Using ravel
or flatten
as
my_1d_array = my_2d_array.flatten() # Return (15,) dimension
my_1d_array = my_2d_array.ravel() # Return (15,) dimension
Such (15,)
type may inflict some inconsistency when performing some matrix operation and result inconsistent data result or program error.
So I prefer you to use reshape
as follows:
my_1d_array = my_2d_array.reshape((-1,1)) # Returns (15,1) dimension
or,
my_1d_array = my_2d_array.reshape((1,-1)) # Returns (1,15) dimension
This way of reshaping into (x, y)
ensures matrix operation will always result consistent data without any bugs.
Upvotes: 0