Reputation: 4953

numpy array row major and column major

I'm having trouble understanding how numpy stores its data. Consider the following:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in xrange(6): a.itemset(i, i+1)
... 
>>> a
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])
>>> a.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This says that a is column major (F_CONTIGUOUS) thus, internally, a should look like the following:

[1, 4, 2, 5, 3, 6]

This is just what it is stated in in this glossary. What is confusing me is that if I try to to access the data of a in a linear fashion instead I get:

>>> for i in xrange(6): print a.item(i)
... 
1.0
2.0
3.0
4.0
5.0
6.0

At this point I'm not sure what the F_CONTIGUOUS flag tells us since it does not honor the ordering. Apparently everything in python is row major and when we want to iterate in a linear fashion we can use the iterator flat.

The question is the following: given that we have a list of numbers, say: 1, 2, 3, 4, 5, 6, how can we create a numpy array of shape (2, 3) in column major order? That is how can I get a matrix that looks like this

array([[ 1.,  3.,  5.],
       [ 2.,  4.,  6.]])

I would really like to be able to iterate linearly over the list and place them into the newly created ndarray. The reason for this is because I will be reading files of multidimensional arrays set in column major order.

Upvotes: 45

Answers (6)

dgpb

Reputation: 419

Very old question, but I feel the answer is missing.

Just to mention a couple functions that have not been mentioned:

a = np.ascontiguousarray(in_arr)
b = np.asfortranarray(in_arr)

However, they will not help with your problem. What will help:

a = np.ndarray(shape=(2,3), order='F')
def memory_index(*args, x):
    idx = (np.array(x.strides) / a.itemsize).dot(np.array(args))
    return int(idx)

flat_view = a.ravel(order='A')   # or order='F' to be explicit
for i, value in enumerate([1, 2, 3, 4, 5, 6]):
    flat_view[i] = value

print(a)

array([[ 1., 3., 5.], [ 2., 4., 6.]])

Obviously, factor out repetitive tasks from memory_index, use simple arithmetics instead of the dot function and it might just be reasonably fast to be worth it.

Upvotes: 1

KamKam

Reputation: 564

Wanted to add this in the comments but my rep is too low:

While Kill Console's answer gave the OP's required solution, I think it's important to note that as stated in the numpy.reshape() documentation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html):

Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

so even if the view is column-wise, the data itself may not be, which may lead to inefficiencies in calculations which benefit from the data being stored column-wise in memory. Perhaps:

a = np.array(np.array([1, 2, 3, 4, 5, 6]).reshape(2,3,order='F'), order='F')

provides more of a guarantee that the data is stored column-wise (see order argument description at https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html).

Upvotes: 10

cfh

Reputation: 4666

Here is a simple way to print the data in memory order, by using the ravel() function:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in range(6): a.itemset(i, i+1)

>>> print(a.ravel(order='K'))
[ 1.  4.  2.  5.  3.  6.]

This confirms that the array is stored in Fortran order.

Upvotes: 0

Matt Hancock

Reputation: 4039

Your question has been answered, but I thought I would add this to explain your observations regarding, "At this point I'm not sure what the F_CONTIGUOUS flag tells us since it does not honor the ordering."

The item method doesn't directly access the data like you think it does. To do this, you should access the data attribute, which gives you the byte string.

An example:

c = np.array([[1,2,3],
              [4,6,7]], order='C')

f = np.array([[1,2,3],
              [4,6,7]], order='F')

Observe

print c.flags.c_contiguous, f.flags.f_contiguous
# True, True

and

print c.nbytes == len(c.data)
# True

Now let's print the contiguous data for both:

nelements = np.prod(c.shape)
bsize = c.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = c.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=c.dtype)[0], # Convert to number.

This prints:

1 2 3 4 6 7

which is what we expect since c is order 'C', i.e., its data is stored row-major contiguous.

On the other hand,

nelements = np.prod(f.shape)
bsize = f.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = f.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=f.dtype)[0], # Convert to number.

prints

1 4 2 6 3 7

which, again, is what we expect to see since f's data is stored column-major contiguous.

Upvotes: 36

Kill Console

Reputation: 2023

The numpy stores data in row major order.

>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a.shape
(2, 4)
>>> a.shape = 4,2
>>> a
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

If you change the shape, the order of data do not change.

If you add a 'F', you can get what you want.

>>> b
array([1, 2, 3, 4, 5, 6])
>>> c = b.reshape(2,3,order='F')
>>> c
array([[1, 3, 5],
       [2, 4, 6]])

Upvotes: 56

Bi Rico

Reputation: 25833

In general, numpy uses order to describe the memory layout, but the python behavior of the arrays should be consistent regardless of the memory layout. I think you can get the behavior you want using views. A view is an array that shares memory with another array. For example:

import numpy as np

a = np.arange(1, 6 + 1)
b = a.reshape(3, 2).T

a[1] = 99
print b
# [[ 1  3  5]
#  [99  4  6]]

Hope that helps.

Upvotes: 4

numpy array row major and column major

Answers (6)

Related Questions