Reputation: 3385
I'm trying to add a new column to an empty NumPy array and am facing some troubles. I've looked at a lot of other questions, but for some reason they don't seem to be helping me solve the problem I'm facing, so I decided to ask my own question.
I have an empty NumPy array such that:
array1 = np.array([])
Let's say I have data that is of shape (100, 100)
, and want to append each column to array1
one by one. However, if I do for example:
array1 = np.append(array1, some_data[:, 0])
array1 = np.append(array1, some_data[:, 1])
I noticed that I won't be getting a (100, 2)
matrix, but a (200,)
array. So I tried to specify the axis
as
array1 = np.append(array1, some_data[:, 0], axis=1)
which produces a AxisError: axis 1 is out of bounds for array of dimension 1.
Next I tried to use the np.c_[]
method:
array1 = np.c_[array1, somedata[:, 0]]
which gives me a ValueError: all the input array dimensions except for the concatenation axis must match exactly.
Is there any way that I would be able to add columns to the NumPy array sequentially?
Thank you.
EDIT
I learned that my initial question didn't contain enough information for others to offer help, and made this update to make up for the initial mistake.
My big objective is to make a program that selects features in a "greedy fashion." Basically, I'm trying to take the design matrix some_data
, which is a (100, 100)
matrix containing floating point numbers as entries, and fitting a linear regression model with an increasing number of features until I find the best set of features.
For example, since I have a total of 100 features, the first round would fit the model on each 100, select the best one and store it, then continue with the remaining 99.
That's what I'm trying to do in my head, but I got stuck from the beginning with the problem I mentioned.
Upvotes: 2
Views: 7397
Reputation: 231335
You start with a (0,) array and (n,) shaped one:
In [482]: arr1 = np.array([])
In [483]: arr1.shape
Out[483]: (0,)
In [484]: arr2 = np.array([1,2,3])
In [485]: arr2.shape
Out[485]: (3,)
np.append
uses concatenate
(but with some funny business when axis is not provided):
In [486]: np.append(arr1, arr2)
Out[486]: array([1., 2., 3.])
In [487]: np.append(arr1, arr2,axis=0)
Out[487]: array([1., 2., 3.])
In [489]: np.concatenate([arr1, arr2])
Out[489]: array([1., 2., 3.])
And trying axis=1
In [488]: np.append(arr1, arr2,axis=1)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
<ipython-input-488-457b8657453e> in <module>()
----> 1 np.append(arr1, arr2,axis=1)
/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in append(arr, values, axis)
4526 values = ravel(values)
4527 axis = arr.ndim-1
-> 4528 return concatenate((arr, values), axis=axis)
AxisError: axis 1 is out of bounds for array of dimension 1
Look at the whole message - the error occurs in the concatenate
step. You can't concatenate 1d arrays along axis=1
.
Using np.append
or even np.concatenate
iteratively is slow (it creates a new array each time), and hard to initialize correctly. It is a poor substitute for the widely use list append-to-empty-list
recipe.
np.c_
is also just a cover function for concatenate
.
There isn't just one empty
array. np.array([[]])
and np.array([[[]]])
also have 0 elements.
If you want to add a column to an array, you need to start with a 2d array, and the column also needs to be 2d.
Here's an example of a proper concatenation of 2 2d arrays:
In [490]: np.concatenate([ np.zeros((3,0),int), np.arange(3)[:,None]], axis=1)
Out[490]:
array([[0],
[1],
[2]])
column_stack
is another cover function for concatenate
that makes sure the inputs are 2d. But even with that getting an initial 'empty' array is tricky.
In [492]: np.column_stack([np.zeros(3,int), np.arange(3)])
Out[492]:
array([[0, 0],
[0, 1],
[0, 2]])
In [493]: np.column_stack([np.zeros((3,0),int), np.arange(3)])
Out[493]:
array([[0],
[1],
[2]])
np.c_
is a lot like column_stack
, though implemented in a different way:
In [496]: np.c_[np.zeros(3,int), np.arange(3)]
Out[496]:
array([[0, 0],
[0, 1],
[0, 2]])
The basic message is, that when using np.concatenate
you need to pay attention to dimensions. Its variants allow you to fudge things a bit, but you really need to understand that fudging to get things right, especially when starting from this poorly defined idea of a 'empty' array.
Upvotes: 4
Reputation: 2078
I usually use concatenate method and do it like this:
# Some stuff
alldata = None
....
array1 = np.random.random((100,1))
if alldata is None: alldata = array1
...
array2 = np.random.random((100,1))
alldata = np.concatenate((alldata,array2),axis=1)
In case, you are working with vectors:
alldata = None
....
array1 = np.random.random((100,))
if alldata is None: alldata = array1[:,np.newaxis]
...
array2 = np.random.random((100,))
alldata = np.concatenate((alldata,array2[:,np.newaxis]),axis=1)
Upvotes: 1