Reputation: 447
I'm doing a machine learning project, my dataset is composed of thousands of x-ray pictures, every time I want to work on this project I have to reload the pictures and pre-process them, which is very time-consuming so I want to read my images once and write the list of thousands of 224x224x3 matrices in a file that I can load everytime I need to work on this project.
I've already found some functions that allow me to write/read lists, but they don't seem to write the whole matrices but only a part:
This is the code I used to write the file:
with open(obj_dir +"train_data_p", "w") as file:
file.write(str(train_data_p))
This is what I get when I open my training dataset file with notepad, as you can see from the "...," parts, it's showing only snippets of matrices:
[array([[[0.26666668, 0.26666668, 0.26666668],
[0.32156864, 0.32156864, 0.32156864],
[0.33333334, 0.33333334, 0.33333334],
...,
[0.75686276, 0.75686276, 0.75686276],
[0.77254903, 0.77254903, 0.77254903],
[0.7764706 , 0.7764706 , 0.7764706 ]],
[[0.27058825, 0.27058825, 0.27058825],
[0.28627452, 0.28627452, 0.28627452],
[0.31764707, 0.31764707, 0.31764707],
...,
[0.7607843 , 0.7607843 , 0.7607843 ],
[0.7647059 , 0.7647059 , 0.7647059 ],
[0.8039216 , 0.8039216 , 0.8039216 ]],
[[0.3019608 , 0.3019608 , 0.3019608 ],
[0.34901962, 0.34901962, 0.34901962],
[0.27058825, 0.27058825, 0.27058825],
...,
[0.78431374, 0.78431374, 0.78431374],
[0.7764706 , 0.7764706 , 0.7764706 ],
[0.78431374, 0.78431374, 0.78431374]],
...,
[[0.1254902 , 0.1254902 , 0.1254902 ],
[0.1254902 , 0.1254902 , 0.1254902 ],
[0.12156863, 0.12156863, 0.12156863],
How can I write/store the whole dataset so I don't have to read and process the images everytime? Help me please!
Upvotes: 0
Views: 139
Reputation: 457
The reason that you are seeing ellipsis in the file is because you are writing str(train_data_p)
to the file, and not actual train_data_p
object.
As pointed by other answers, there are numerous packages that help storing large data. If you are using numpy, this answer may help you too.
Upvotes: 1
Reputation: 111
You can do it by numpy.save()
and numpy.load()
methods
import numpy as np
np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
np.load('/tmp/123.npy')
Upvotes: 1
Reputation: 3981
You can serialize your data using builtin modules easy.
We have different options list:
Or any other 3rd party serialization package available in pip.
More about serialization https://en.wikipedia.org/wiki/Serialization
Upvotes: 0