Reputation: 6879
Given:
test = np.array([[1, 2], [3, 4], [5, 6]])
test[i]
gives the ith row (e.g. [1, 2]
). How do I access the ith column? (e.g. [1, 3, 5]
). Also, would this be an expensive operation?
Upvotes: 683
Views: 1256304
Reputation: 35088
With:
test = np.array([[1, 2], [3, 4], [5, 6]])
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
Upvotes: 1003
Reputation: 1
I just want to clarify harmand's comment under mtrw's highest score answer is confusing. He says:
This create a copy, is it possible to get reference, like I get a reference to a column, any change in this reference is reflected in the original array.
While actually this code
import numpy as np
arr = np.array([[1, 2], [3, 4], [5, 6]])
barr = arr[:, 1]
print(barr)
barr[1] = 8
print(arr)
prints out
[[1 2]
[3 8]
[5 6]]
I would appreciate if you note this in the comments under mtrw's answer as my reputation is too low yet.
Upvotes: 0
Reputation: 23001
This question has been answered but a note on view vs copy.
If the array is indexed using a scalar (regular indexing), the result is a view (x
below) which means whatever change made to x
will reflect on test
because x
is just a different view of test
.
test = np.array([[1, 2], [3, 4], [5, 6]])
# select second column
x = test[:, 1]
x[:] = 100 # <---- this does affects test
test
array([[ 1, 100],
[ 3, 100],
[ 5, 100]])
However, if the array is indexed using a list/array-like (advanced indexing), the result is a copy, which means any changes to x
will not affect test
.
test = np.array([[1, 2], [3, 4], [5, 6]])
# select second column
x = test[:, [1]]
x[:] = 100 # <---- this does not affect test
test
array([[1, 2],
[3, 4],
[5, 6]])
In general, using a slice to index will return a view:
test = np.array([[1, 2], [3, 4], [5, 6]])
x = test[:, :2]
x[:] = 100
test
array([[100, 100],
[100, 100],
[100, 100]])
but using an array to index will return a copy:
test = np.array([[1, 2], [3, 4], [5, 6]])
x = test[:, np.r_[:2]]
x[:] = 100
test
array([[1, 2],
[3, 4],
[5, 6]])
Regular indexing is extremely fast and advanced indexing is much slower (that said, it's still almost instantaneous and it certainly will not be a bottleneck in the program).
Upvotes: 0
Reputation: 95
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b
Upvotes: 4
Reputation: 9796
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)
), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view
will affect the arr
), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
Why is this important? Imagine that you have a very big array A
instead of the arr
:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum()
or A_col1_copy.sum()
. Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy
). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
Upvotes: 25
Reputation: 1411
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
Upvotes: 105
Reputation: 1077
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
Upvotes: 9
Reputation: 10190
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Upvotes: 25
Reputation: 177
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
Upvotes: 4
Reputation: 86128
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
Upvotes: 93