Reputation: 658
I've got a large nested array a
(256x256x256) that I need to restructure into a list with elements like this:
[ (i,j,k), a[i,j,k] ]
I'm currently doing this as follows:
aflat = a.flatten().tolist()
coords = list(itertools.product(range(256), repeat=3))
thelist = [list(x) for x in zip(coords, aflat)]
This works, but it's rather slow.
I can probably save a second or so by removing the runtime generation of the coordinate vectors and reading those from a file instead. However, the main slowdown seems to be in the last line, which is clocking in at over 6 seconds.
Is there a faster way to generate the data structure I need in Python?
Upvotes: 3
Views: 954
Reputation: 5677
As @P-i commented, the major problem is that the code creates a ton of lists, and Python spends a lot of time doing memory management. To eliminate this, you can use numpy arrays to preallocate the data, and use its repeat
and tile
functions to generate the i,j,k
values:
# order='F' is important here so column-wise assignment can
# occur with a stride of 1. Switching the order results
# in a significant performance hit.
coords = numpy.zeros([a.size,4],'d',order='F')
NI, NJ, NK = a.shape
# build columns for (i,j,k) tuples using repeat and tile
coords[:,0] = numpy.repeat(range(NI),NJ*NK)
coords[:,1] = numpy.tile(numpy.repeat(range(NJ),NK), NI)
coords[:,2] = numpy.tile(range(NK), NI*NJ)
coords[:,3] = a.flatten()
This results in an array where each row is (i,j,k,value)
. It does assume that your original array is in row-major ordering (C-ordered arrays in numpy).
In my timings, based on ten iterations in Python 3.5 on a 2013 MacBook Pro, it took about 20 seconds per transformation to run the OP's transformation and only about 8 seconds per transformation using this method.
The output format really has to be a list, the array can be converted into a list in the final step. However, this increased the transformation time to 13 seconds per transformation in my testing.
Upvotes: 1
Reputation: 284582
To expand on @wii's comment above, you're looking for np.ndenumerate
.
Typically, you'd avoid explicitly creating your list and use an iterator instead. For example:
for (i,j,k), val in np.ndenumerate(your_3d_array):
assert val == your_3d_array[i,j,k]
# Note that we also could have done:
for ind, val in np.ndenumerate(your_3d_array):
assert val == your_3d_array[ind]
However, if you did want to create the full intermediate list, you'd use:
list(np.ndenumerate(your_3d_array))
As a more complete example:
In [1]: import numpy as np
In [2]: x = np.arange(3*4*5).reshape(3, 4, 5)
In [3]: x
Out[7]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
In [4]: list(np.ndenumerate(x))
Out [4]:
[((0, 0, 0), 0),
((0, 0, 1), 1),
((0, 0, 2), 2),
((0, 0, 3), 3),
...
((2, 3, 1), 56),
((2, 3, 2), 57),
((2, 3, 3), 58),
((2, 3, 4), 59)]
Upvotes: 0