David
David

Reputation: 8298

transform a 3D numpy array into a list of 3 indices

So I have large 3D data matrix, say 10000X10000X1000, now what I need to do is to go over every element of the 3D data matrix and write to a file the indices and the values of 2 different matrix with the same size, an example of a line:

i j k val1 val2

What I currently do is run in 3 nested loops and prints it the following way, example of 2 small 3D data matrix and the approach:

import numpy as np


vv1= np.array([[[1,2,3],[2,3,4],[3,4,5]],
                [[4,5,6],[5,6,7],[6,7,8]],
                [[7,8,9],[8,9,10],[9,10,11]]])

vv2= np.array([[[1,2,3],[2,3,4],[3,4,5]],
                [[4,5,6],[5,6,7],[6,7,8]],
                [[7,8,9],[8,9,10],[9,10,11]]])

for x in range(vv1.shape[0]):
    for y in range(vv1.shape[1]):
        for z in range(vv1.shape[2]):
            print("{:} {:} {:} {:} {:}".format(x,y,z,vv1[x,y,z], vv2[x,y,z]))

This simple code does the job but slowly.

a different approach I thought of was to create a 1D long vector that each entry will be 3 indices values and then apply the same logic with the printing for example, a nested loop example:

vv_ind = []

for x in range(vv1.shape[0]):
    for y in range(vv1.shape[1]):
        for z in range(vv1.shape[2]):
            vv_ind.append([x,y,z])

for elem in vv_ind:
    i = tuple(elem)
    print("{:} {:} {:} {:} {:}".format(*elem, vv1[i], vv2[i]))

which gives the desired output.

My questions are as follows:

  1. is there a more "pythonic" way to create that list of indices?
  2. regarding the last printing loop:

    for elem in vv_ind:
        i = tuple(elem)
        print("{:} {:} {:} {:} {:}".format(*elem, vv1[i], vv2[i]))
    

    is there a more efficient way to do it?

Again the arrays given here are just dummy ones for ease

Would appreciate some help

Upvotes: 3

Views: 347

Answers (3)

javidcf
javidcf

Reputation: 59681

You can do that with np.savetxt using a structured array, if the data is not integer:

import numpy as np
import io

# Data
vv1 = np.array([[[  1,  2,  3], [  2,  3,  4],[  3,  4,  5]],
                [[  4,  5,  6], [  5,  6,  7],[  6,  7,  8]],
                [[  7,  8,  9], [  8,  9, 10],[  9, 10, 11]]], np.float32)
vv2 = np.array([[[  1,  2,  3], [  2,  3,  4],[  3,  4,  5]],
                [[  4,  5,  6], [  5,  6,  7],[  6,  7,  8]],
                [[  7,  8,  9], [  8,  9, 10],[  9, 10, 11]]], np.float32)

xx, yy, zz = np.meshgrid(*map(range, vv1.shape), indexing='ij')
# Structured array of indices and data
a = np.empty(len(idx), dtype='i,i,i,f,f')
a['f0'] = xx.ravel()
a['f1'] = yy.ravel()
a['f2'] = zz.ravel()
a['f3'] = vv1.ravel()
a['f4'] = vv2.ravel()
# Using StringIO here to show result, normally would use a file object or file name
s = io.StringIO()
np.savetxt(s, a, fmt='%d %d %d %.3f %.3f')
print(s.getvalue())

Output:

0 0 0 1.000 1.000
0 0 1 2.000 2.000
0 0 2 3.000 3.000
0 1 0 2.000 2.000
0 1 1 3.000 3.000
0 1 2 4.000 4.000
0 2 0 3.000 3.000
0 2 1 4.000 4.000
0 2 2 5.000 5.000
1 0 0 4.000 4.000
1 0 1 5.000 5.000
1 0 2 6.000 6.000
1 1 0 5.000 5.000
1 1 1 6.000 6.000
1 1 2 7.000 7.000
1 2 0 6.000 6.000
1 2 1 7.000 7.000
1 2 2 8.000 8.000
2 0 0 7.000 7.000
2 0 1 8.000 8.000
2 0 2 9.000 9.000
2 1 0 8.000 8.000
2 1 1 9.000 9.000
2 1 2 10.000 10.000
2 2 0 9.000 9.000
2 2 1 10.000 10.000
2 2 2 11.000 11.000

np.savetxt really just loops through the data internally, so it is not like it should be magically faster, though. It may not be worth creating the additional big array for it.

Upvotes: 1

Mykola Zotko
Mykola Zotko

Reputation: 17794

To create a list of indices you can use the function product:

from itertools import product

product(*3 * [range(3)]) # generator of indices

or

product(range(3), range(3), range(3))

or

from itertools import product, repeat

product(*repeat(range(3), 3))

You can simplify your code:

from itertools import product, repeat

for idx in product(*repeat(range(3), 3)):
    print(*idx, vv1[idx], vv2[idx])

As @a_guest mentioned in the comment we can use np.ndindex(*vv1.shape) instead of product(*repeat(range(3), 3)):

Upvotes: 1

a_guest
a_guest

Reputation: 36239

You can use np.mgrid for generating the indices and in case you don't mind saving everything as the same data type you can just stack the arrays together and save the result via np.save or np.savetxt:

In [1]: import numpy as np                                                                    

In [2]: a = np.random.randint(0, 255, size=(4, 4, 4))                                         

In [3]: b = np.random.randint(0, 255, size=(4, 4, 4))                                         

In [4]: data = np.stack([x.ravel() for x in np.mgrid[:4, :4, :4]] + [a.ravel(), b.ravel()], axis=1)                                                                                 

In [5]: np.save('/tmp/test.npy', data)                                                        

In [6]: data                                                                                  
Out[6]: 
array([[  0,   0,   0, 169,  35],
       [  0,   0,   1,  14, 120],
       [  0,   0,   2,  93, 207],
       [  0,   0,   3,  70, 158],
       [  0,   1,   0, 115,  52],
       [  0,   1,   1,  10, 248],
       [  0,   1,   2,   5, 123],
       [  0,   1,   3, 125, 143],
       [  0,   2,   0,  73, 241],
       [  0,   2,   1,  25, 118],
       [  0,   2,   2, 240, 159],
       [  0,   2,   3,  60, 179],
       [  0,   3,   0,  29, 221],
       [  0,   3,   1, 214,  33],
       [  0,   3,   2, 145,  60],
       [  0,   3,   3, 207,  74],
       [  1,   0,   0,   7,  37],
       [  1,   0,   1, 146, 192],
       [  1,   0,   2, 227,  83],
       [  1,   0,   3, 247,  51],
       [  1,   1,   0, 253,  18],
       [  1,   1,   1, 188,   2],
       [  1,   1,   2, 164, 252],
       [  1,   1,   3, 192, 229],
       [  1,   2,   0,  18, 236],
       [  1,   2,   1,  85,  48],
       [  1,   2,   2,  20, 233],
       [  1,   2,   3,  81, 152],
       [  1,   3,   0, 122,  30],
       [  1,   3,   1, 227, 221],
       [  1,   3,   2,  11, 247],
       [  1,   3,   3,  84, 203],
       [  2,   0,   0,   5,  94],
       [  2,   0,   1, 174, 179],
       [  2,   0,   2, 224, 222],
       [  2,   0,   3, 168,  40],
       [  2,   1,   0, 160, 136],
       [  2,   1,   1,  16, 121],
       [  2,   1,   2, 237, 241],
       [  2,   1,   3,  70,  29],
       [  2,   2,   0, 127, 188],
       [  2,   2,   1,  33,  67],
       [  2,   2,   2,   4, 138],
       [  2,   2,   3, 153, 114],
       [  2,   3,   0, 162,   8],
       [  2,   3,   1, 254,  91],
       [  2,   3,   2, 153,  69],
       [  2,   3,   3, 167,  33],
       [  3,   0,   0,  99, 101],
       [  3,   0,   1,  26,   2],
       [  3,   0,   2, 162, 131],
       [  3,   0,   3,  23,  97],
       [  3,   1,   0, 226,  37],
       [  3,   1,   1,   5, 130],
       [  3,   1,   2, 215, 164],
       [  3,   1,   3, 247,  95],
       [  3,   2,   0, 138,  49],
       [  3,   2,   1, 248, 175],
       [  3,   2,   2, 134,  39],
       [  3,   2,   3, 170,  67],
       [  3,   3,   0,   1, 177],
       [  3,   3,   1, 245,  31],
       [  3,   3,   2,  71, 160],
       [  3,   3,   3,  81,   9]])

Otherwise you can also use np.ndindex for iterating over the arrays indices:

In [10]: with open('/tmp/test.txt', 'w') as fh: 
    ...:     for index in np.ndindex(*a.shape): 
    ...:         data = map(str, index + (a[index], b[index])) 
    ...:         fh.write(','.join(data) + '\n')

Upvotes: 2

Related Questions