Reputation: 33
I have some data I need to pre-process for a later step in a 3D Convolutional Network. The data comes in a file formatted like this:
POSITION
x y z (feature 1 x) (feature 1 y) (feature 1 z) (feature 2 x) (feature 2 y ...
1.2 0.54 2.3 0.04 0.2 -0.9 -0.2 0.65 ...
...(more rows of the same format)...
And after some other steps which involve operating on the positional data and the features, I get a pytorch tensor with dimensions [height][width][depth][features]
, or equivalently a numpy array, where the first three are positional data that I can use to plot the features using colours, and the [features]
are vectors containing each of the feature values.
These are pretty large files and I'd like not have to perform the conversion from the first file format shown above to the tensor/array form later during processing. I'm thinking of using torch.save(tensor, 'file.pt')
.
My question is: what is the best file format to save this data so that it can be easily accessed later without the need for any pre-processing? Having to serialize it with PyTorch seems to be quite a convoluted way to save a type of data I would expect to have a more specific/designated file format.
Upvotes: 2
Views: 1678
Reputation: 33
I think I've found a more direct way to do it. Numpy supports saving its arrays as a .npy
file.
The procedure is pretty straightforward. To save an array array_1
into the file numpy_array_1.npy
, all you need to do is:
np.save('numpy_array_1.npy', array_l)
And then to load it into array_2
:
array_2 = np.load('numpy_array_1.npy')
Upvotes: 1