Marco Tompitak
Marco Tompitak

Reputation: 658

Efficiently Reshape 3D Numpy Array into 1D List Plus Coordinate Vector

I've got a large nested array a (256x256x256) that I need to restructure into a list with elements like this:

[ (i,j,k), a[i,j,k] ]

I'm currently doing this as follows:

aflat = a.flatten().tolist()
coords = list(itertools.product(range(256), repeat=3))
thelist = [list(x) for x in zip(coords, aflat)]

This works, but it's rather slow.

I can probably save a second or so by removing the runtime generation of the coordinate vectors and reading those from a file instead. However, the main slowdown seems to be in the last line, which is clocking in at over 6 seconds.

Is there a faster way to generate the data structure I need in Python?

Upvotes: 3

Views: 954

Answers (2)

sfstewman
sfstewman

Reputation: 5677

As @P-i commented, the major problem is that the code creates a ton of lists, and Python spends a lot of time doing memory management. To eliminate this, you can use numpy arrays to preallocate the data, and use its repeat and tile functions to generate the i,j,k values:

# order='F' is important here so column-wise assignment can
# occur with a stride of 1.  Switching the order results
# in a significant performance hit.
coords = numpy.zeros([a.size,4],'d',order='F')

NI, NJ, NK = a.shape

# build columns for (i,j,k) tuples using repeat and tile
coords[:,0] = numpy.repeat(range(NI),NJ*NK)
coords[:,1] = numpy.tile(numpy.repeat(range(NJ),NK), NI)
coords[:,2] = numpy.tile(range(NK), NI*NJ)
coords[:,3] = a.flatten()

This results in an array where each row is (i,j,k,value). It does assume that your original array is in row-major ordering (C-ordered arrays in numpy).

In my timings, based on ten iterations in Python 3.5 on a 2013 MacBook Pro, it took about 20 seconds per transformation to run the OP's transformation and only about 8 seconds per transformation using this method.

The output format really has to be a list, the array can be converted into a list in the final step. However, this increased the transformation time to 13 seconds per transformation in my testing.

Upvotes: 1

Joe Kington
Joe Kington

Reputation: 284582

To expand on @wii's comment above, you're looking for np.ndenumerate.

Typically, you'd avoid explicitly creating your list and use an iterator instead. For example:

for (i,j,k), val in np.ndenumerate(your_3d_array):
    assert val == your_3d_array[i,j,k]

# Note that we also could have done:
for ind, val in np.ndenumerate(your_3d_array):
    assert val == your_3d_array[ind]

However, if you did want to create the full intermediate list, you'd use:

list(np.ndenumerate(your_3d_array))

As a more complete example:

In [1]: import numpy as np

In [2]: x = np.arange(3*4*5).reshape(3, 4, 5)

In [3]: x
Out[7]:
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

In  [4]: list(np.ndenumerate(x))
Out [4]: 
[((0, 0, 0), 0),
 ((0, 0, 1), 1),
 ((0, 0, 2), 2),
 ((0, 0, 3), 3),
...
 ((2, 3, 1), 56),
 ((2, 3, 2), 57),
 ((2, 3, 3), 58),
 ((2, 3, 4), 59)]

Upvotes: 0

Related Questions