Reputation: 6855
Is there a quick way to "sub-flatten" or flatten only some of the first dimensions in a numpy array?
For example, given a numpy array of dimensions (50,100,25)
, the resultant dimensions would be (5000,25)
Upvotes: 250
Views: 247977
Reputation: 555
numpy.vstack
is perfect for this situation
import numpy as np
arr = np.ones((50,100,25))
np.vstack(arr).shape
> (5000, 25)
I prefer to use stack
, vstack
or hstack
over reshape
because reshape
just scans through the data and seems to brute-force it into the desired shape. This can be problematic if you are e.g. going to take column averages.
Here's an illustration of what I mean. Suppose we have the following array
>>> arr.shape
(2, 3, 4)
>>> arr
array([[[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]],
[[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7]]])
We apply both methods to get an array of shape (3,8)
>>> arr.reshape((3,8)).shape
(3, 8)
>>> np.hstack(arr).shape
(3, 8)
However if we look at how they have been reshaped in each case, the hstack
would allow us to take column sums that we could also have calculated from the original array. With reshape this isn't possible.
>>> arr.reshape((3,8))
array([[1, 2, 3, 4, 1, 2, 3, 4],
[1, 2, 3, 4, 7, 7, 7, 7],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> np.hstack(arr)
array([[1, 2, 3, 4, 7, 7, 7, 7],
[1, 2, 3, 4, 7, 7, 7, 7],
[1, 2, 3, 4, 7, 7, 7, 7]])
Upvotes: 19
Reputation: 61305
An alternative approach is to use numpy.resize()
as in:
In [37]: shp = (50,100,25)
In [38]: arr = np.random.random_sample(shp)
In [45]: resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In [46]: resized_arr.shape
Out[46]: (5000, 25)
# sanity check with other solutions
In [47]: resized = np.reshape(arr, (-1, shp[-1]))
In [48]: np.allclose(resized_arr, resized)
Out[48]: True
Upvotes: 6
Reputation: 1345
A slight generalization to Peter's answer -- you can specify a range over the original array's shape if you want to go beyond three dimensional arrays.
e.g. to flatten all but the last two dimensions:
arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)
EDIT: A slight generalization to my earlier answer -- you can, of course, also specify a range at the beginning of the of the reshape too:
arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)
Upvotes: 77
Reputation: 12775
Take a look at numpy.reshape .
>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)
>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape
# (5000, 25)
# One shape dimension can be -1.
# In this case, the value is inferred from
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)
Upvotes: 219
Reputation: 13475
A slight generalization to Alexander's answer - np.reshape can take -1 as an argument, meaning "total array size divided by product of all other listed dimensions":
e.g. to flatten all but the last dimension:
>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)
Upvotes: 123