Reputation: 4953
I'm having trouble understanding how numpy
stores its data. Consider the following:
>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in xrange(6): a.itemset(i, i+1)
...
>>> a
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
>>> a.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
This says that a
is column major (F_CONTIGUOUS
) thus, internally, a
should look like the following:
[1, 4, 2, 5, 3, 6]
This is just what it is stated in in this glossary. What is confusing me is that if I try to to access the data of a
in a linear fashion instead I get:
>>> for i in xrange(6): print a.item(i)
...
1.0
2.0
3.0
4.0
5.0
6.0
At this point I'm not sure what the F_CONTIGUOUS
flag tells us since it does not honor the ordering. Apparently everything in python is row major and when we want to iterate in a linear fashion we can use the iterator flat
.
The question is the following: given that we have a list of numbers, say: 1, 2, 3, 4, 5, 6
, how can we create a numpy
array of shape (2, 3)
in column major order? That is how can I get a matrix that looks like this
array([[ 1., 3., 5.],
[ 2., 4., 6.]])
I would really like to be able to iterate linearly over the list and place them into the newly created ndarray
. The reason for this is because I will be reading files of multidimensional arrays set in column major order.
Upvotes: 45
Views: 85593
Reputation: 419
Very old question, but I feel the answer is missing.
Just to mention a couple functions that have not been mentioned:
a = np.ascontiguousarray(in_arr)
b = np.asfortranarray(in_arr)
However, they will not help with your problem. What will help:
a = np.ndarray(shape=(2,3), order='F')
def memory_index(*args, x):
idx = (np.array(x.strides) / a.itemsize).dot(np.array(args))
return int(idx)
flat_view = a.ravel(order='A') # or order='F' to be explicit
for i, value in enumerate([1, 2, 3, 4, 5, 6]):
flat_view[i] = value
print(a)
array([[ 1., 3., 5.], [ 2., 4., 6.]])
Obviously, factor out repetitive tasks from memory_index
, use simple arithmetics instead of the dot
function and it might just be reasonably fast to be worth it.
Upvotes: 1
Reputation: 564
Wanted to add this in the comments but my rep is too low:
While Kill Console's answer gave the OP's required solution, I think it's important to note that as stated in the numpy.reshape() documentation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html):
Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.
so even if the view is column-wise, the data itself may not be, which may lead to inefficiencies in calculations which benefit from the data being stored column-wise in memory. Perhaps:
a = np.array(np.array([1, 2, 3, 4, 5, 6]).reshape(2,3,order='F'), order='F')
provides more of a guarantee that the data is stored column-wise (see order argument description at https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html).
Upvotes: 10
Reputation: 4666
Here is a simple way to print the data in memory order, by using the ravel()
function:
>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in range(6): a.itemset(i, i+1)
>>> print(a.ravel(order='K'))
[ 1. 4. 2. 5. 3. 6.]
This confirms that the array is stored in Fortran order.
Upvotes: 0
Reputation: 4039
Your question has been answered, but I thought I would add this to explain your observations regarding, "At this point I'm not sure what the F_CONTIGUOUS
flag tells us since it does not honor the ordering."
The item
method doesn't directly access the data like you think it does. To do this, you should access the data
attribute, which gives you the byte string.
An example:
c = np.array([[1,2,3],
[4,6,7]], order='C')
f = np.array([[1,2,3],
[4,6,7]], order='F')
Observe
print c.flags.c_contiguous, f.flags.f_contiguous
# True, True
and
print c.nbytes == len(c.data)
# True
Now let's print the contiguous data for both:
nelements = np.prod(c.shape)
bsize = c.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
bnum = c.data[i*bsize : (i+1)*bsize] # The element as a byte string.
print np.fromstring(bnum, dtype=c.dtype)[0], # Convert to number.
This prints:
1 2 3 4 6 7
which is what we expect since c
is order 'C'
, i.e., its data is stored row-major contiguous.
On the other hand,
nelements = np.prod(f.shape)
bsize = f.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
bnum = f.data[i*bsize : (i+1)*bsize] # The element as a byte string.
print np.fromstring(bnum, dtype=f.dtype)[0], # Convert to number.
prints
1 4 2 6 3 7
which, again, is what we expect to see since f
's data is stored column-major contiguous.
Upvotes: 36
Reputation: 2023
The numpy stores data in row major order.
>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a.shape
(2, 4)
>>> a.shape = 4,2
>>> a
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
If you change the shape, the order of data do not change.
If you add a 'F', you can get what you want.
>>> b
array([1, 2, 3, 4, 5, 6])
>>> c = b.reshape(2,3,order='F')
>>> c
array([[1, 3, 5],
[2, 4, 6]])
Upvotes: 56
Reputation: 25833
In general, numpy uses order to describe the memory layout, but the python behavior of the arrays should be consistent regardless of the memory layout. I think you can get the behavior you want using views. A view is an array that shares memory with another array. For example:
import numpy as np
a = np.arange(1, 6 + 1)
b = a.reshape(3, 2).T
a[1] = 99
print b
# [[ 1 3 5]
# [99 4 6]]
Hope that helps.
Upvotes: 4