Reputation: 8298
So I have large 3D data matrix, say 10000X10000X1000, now what I need to do is to go over every element of the 3D data matrix and write to a file the indices and the values of 2 different matrix with the same size, an example of a line:
i j k val1 val2
What I currently do is run in 3 nested loops and prints it the following way, example of 2 small 3D data matrix and the approach:
import numpy as np
vv1= np.array([[[1,2,3],[2,3,4],[3,4,5]],
[[4,5,6],[5,6,7],[6,7,8]],
[[7,8,9],[8,9,10],[9,10,11]]])
vv2= np.array([[[1,2,3],[2,3,4],[3,4,5]],
[[4,5,6],[5,6,7],[6,7,8]],
[[7,8,9],[8,9,10],[9,10,11]]])
for x in range(vv1.shape[0]):
for y in range(vv1.shape[1]):
for z in range(vv1.shape[2]):
print("{:} {:} {:} {:} {:}".format(x,y,z,vv1[x,y,z], vv2[x,y,z]))
This simple code does the job but slowly.
a different approach I thought of was to create a 1D long vector that each entry will be 3 indices values and then apply the same logic with the printing for example, a nested loop example:
vv_ind = []
for x in range(vv1.shape[0]):
for y in range(vv1.shape[1]):
for z in range(vv1.shape[2]):
vv_ind.append([x,y,z])
for elem in vv_ind:
i = tuple(elem)
print("{:} {:} {:} {:} {:}".format(*elem, vv1[i], vv2[i]))
which gives the desired output.
My questions are as follows:
regarding the last printing loop:
for elem in vv_ind:
i = tuple(elem)
print("{:} {:} {:} {:} {:}".format(*elem, vv1[i], vv2[i]))
is there a more efficient way to do it?
Again the arrays given here are just dummy ones for ease
Would appreciate some help
Upvotes: 3
Views: 347
Reputation: 59681
You can do that with np.savetxt
using a structured array, if the data is not integer:
import numpy as np
import io
# Data
vv1 = np.array([[[ 1, 2, 3], [ 2, 3, 4],[ 3, 4, 5]],
[[ 4, 5, 6], [ 5, 6, 7],[ 6, 7, 8]],
[[ 7, 8, 9], [ 8, 9, 10],[ 9, 10, 11]]], np.float32)
vv2 = np.array([[[ 1, 2, 3], [ 2, 3, 4],[ 3, 4, 5]],
[[ 4, 5, 6], [ 5, 6, 7],[ 6, 7, 8]],
[[ 7, 8, 9], [ 8, 9, 10],[ 9, 10, 11]]], np.float32)
xx, yy, zz = np.meshgrid(*map(range, vv1.shape), indexing='ij')
# Structured array of indices and data
a = np.empty(len(idx), dtype='i,i,i,f,f')
a['f0'] = xx.ravel()
a['f1'] = yy.ravel()
a['f2'] = zz.ravel()
a['f3'] = vv1.ravel()
a['f4'] = vv2.ravel()
# Using StringIO here to show result, normally would use a file object or file name
s = io.StringIO()
np.savetxt(s, a, fmt='%d %d %d %.3f %.3f')
print(s.getvalue())
Output:
0 0 0 1.000 1.000
0 0 1 2.000 2.000
0 0 2 3.000 3.000
0 1 0 2.000 2.000
0 1 1 3.000 3.000
0 1 2 4.000 4.000
0 2 0 3.000 3.000
0 2 1 4.000 4.000
0 2 2 5.000 5.000
1 0 0 4.000 4.000
1 0 1 5.000 5.000
1 0 2 6.000 6.000
1 1 0 5.000 5.000
1 1 1 6.000 6.000
1 1 2 7.000 7.000
1 2 0 6.000 6.000
1 2 1 7.000 7.000
1 2 2 8.000 8.000
2 0 0 7.000 7.000
2 0 1 8.000 8.000
2 0 2 9.000 9.000
2 1 0 8.000 8.000
2 1 1 9.000 9.000
2 1 2 10.000 10.000
2 2 0 9.000 9.000
2 2 1 10.000 10.000
2 2 2 11.000 11.000
np.savetxt
really just loops through the data internally, so it is not like it should be magically faster, though. It may not be worth creating the additional big array for it.
Upvotes: 1
Reputation: 17794
To create a list of indices you can use the function product
:
from itertools import product
product(*3 * [range(3)]) # generator of indices
or
product(range(3), range(3), range(3))
or
from itertools import product, repeat
product(*repeat(range(3), 3))
You can simplify your code:
from itertools import product, repeat
for idx in product(*repeat(range(3), 3)):
print(*idx, vv1[idx], vv2[idx])
As @a_guest mentioned in the comment we can use np.ndindex(*vv1.shape)
instead of product(*repeat(range(3), 3))
:
Upvotes: 1
Reputation: 36239
You can use np.mgrid
for generating the indices and in case you don't mind saving everything as the same data type you can just stack the arrays together and save the result via np.save
or np.savetxt
:
In [1]: import numpy as np
In [2]: a = np.random.randint(0, 255, size=(4, 4, 4))
In [3]: b = np.random.randint(0, 255, size=(4, 4, 4))
In [4]: data = np.stack([x.ravel() for x in np.mgrid[:4, :4, :4]] + [a.ravel(), b.ravel()], axis=1)
In [5]: np.save('/tmp/test.npy', data)
In [6]: data
Out[6]:
array([[ 0, 0, 0, 169, 35],
[ 0, 0, 1, 14, 120],
[ 0, 0, 2, 93, 207],
[ 0, 0, 3, 70, 158],
[ 0, 1, 0, 115, 52],
[ 0, 1, 1, 10, 248],
[ 0, 1, 2, 5, 123],
[ 0, 1, 3, 125, 143],
[ 0, 2, 0, 73, 241],
[ 0, 2, 1, 25, 118],
[ 0, 2, 2, 240, 159],
[ 0, 2, 3, 60, 179],
[ 0, 3, 0, 29, 221],
[ 0, 3, 1, 214, 33],
[ 0, 3, 2, 145, 60],
[ 0, 3, 3, 207, 74],
[ 1, 0, 0, 7, 37],
[ 1, 0, 1, 146, 192],
[ 1, 0, 2, 227, 83],
[ 1, 0, 3, 247, 51],
[ 1, 1, 0, 253, 18],
[ 1, 1, 1, 188, 2],
[ 1, 1, 2, 164, 252],
[ 1, 1, 3, 192, 229],
[ 1, 2, 0, 18, 236],
[ 1, 2, 1, 85, 48],
[ 1, 2, 2, 20, 233],
[ 1, 2, 3, 81, 152],
[ 1, 3, 0, 122, 30],
[ 1, 3, 1, 227, 221],
[ 1, 3, 2, 11, 247],
[ 1, 3, 3, 84, 203],
[ 2, 0, 0, 5, 94],
[ 2, 0, 1, 174, 179],
[ 2, 0, 2, 224, 222],
[ 2, 0, 3, 168, 40],
[ 2, 1, 0, 160, 136],
[ 2, 1, 1, 16, 121],
[ 2, 1, 2, 237, 241],
[ 2, 1, 3, 70, 29],
[ 2, 2, 0, 127, 188],
[ 2, 2, 1, 33, 67],
[ 2, 2, 2, 4, 138],
[ 2, 2, 3, 153, 114],
[ 2, 3, 0, 162, 8],
[ 2, 3, 1, 254, 91],
[ 2, 3, 2, 153, 69],
[ 2, 3, 3, 167, 33],
[ 3, 0, 0, 99, 101],
[ 3, 0, 1, 26, 2],
[ 3, 0, 2, 162, 131],
[ 3, 0, 3, 23, 97],
[ 3, 1, 0, 226, 37],
[ 3, 1, 1, 5, 130],
[ 3, 1, 2, 215, 164],
[ 3, 1, 3, 247, 95],
[ 3, 2, 0, 138, 49],
[ 3, 2, 1, 248, 175],
[ 3, 2, 2, 134, 39],
[ 3, 2, 3, 170, 67],
[ 3, 3, 0, 1, 177],
[ 3, 3, 1, 245, 31],
[ 3, 3, 2, 71, 160],
[ 3, 3, 3, 81, 9]])
Otherwise you can also use np.ndindex
for iterating over the arrays indices:
In [10]: with open('/tmp/test.txt', 'w') as fh:
...: for index in np.ndindex(*a.shape):
...: data = map(str, index + (a[index], b[index]))
...: fh.write(','.join(data) + '\n')
Upvotes: 2