Reputation: 36624
I am scraping cars, and will have many pictures, this part is not a problem. I want to save the car specifications also. I am wondering the best way to do this efficiently. Ideally, I would have something like built-in datasets in many libraries. Such as:
print(dataset)
{
'image': ([255, 203, 145, ...]),
'info': (['Audi', '355 HP', ...])
}
That way, I could easily extract images and info with dataset['info']
, or something. I could easily assign both like x, y = dataset
.
Upvotes: 1
Views: 49
Reputation: 180
There are several options, but for structured data like this, it's common to store dictionaries using hdf5.
See python tutorial and full documentation here
http://docs.h5py.org/en/stable/quick.html
Here's a full python example. Notice the dictionary like interface.
import h5py
import numpy as np
#####
#writing output file
#####
my_file = h5py.File("output.h5",'w')
my_file['info'] = np.string_("some_random pixels") #hdf5 needs numpy to store strings
my_file['image'] = np.random.rand(5,5)
my_file.close()
#####
#reading input file
#####
loaded_file = h5py.File("output.h5",'r')
print(np.array(loaded_file['info'])) #hdf5 also needs numpy to read strings as well
print(np.array(loaded_file['image']))
loaded_file.close()
Upvotes: 1