Reputation: 674
The cifar10 tutorial deals with binary files as input. Each record/example on these CIFAR10 datafiles contain mixed label (first element) and image data information. The first answer in this page shows how to write binary file from a numpy array (which accumulates the label and image data information in each row) using ndarray.tofile() as follows:
import numpy as np
images_and_labels_array = np.array([[...], ...], dtype=np.uint8)
images_and_labels_array.tofile("/tmp/images.bin")
This is perfect for me when the maximum number of classes is 256 as the uint8 datatype is sufficient. However, when the maximum number of classes is more than 256, then I have to change the dtype=np.uint16 in the images_and_labels_array. The consequence is just doubling the size. I would like to know if there is an efficient way to overcome it. If yes, please provide an example.
Upvotes: 0
Views: 641
Reputation: 633
When I write binary files I usually just use the python module struct, which works somehow like this:
import struct
import numpy as np
image = np.zeros([2, 300, 300], dtype=np.uint8)
label = np.zeros([2, 1], dtype=np.uint16)
with open('data.bin', 'w') as fo:
s = image.shape
for k in range(s[0]):
# write label as uint16
fo.write(struct.pack('H', label[k, 0]))
# write image as uint8
for i in range(s[1]):
for j in range(s[2]):
fo.write(struct.pack('B', image[k, i, j]))
This should result in a 300*300*2 + 2*1*2 = 180004 bytes big binary file. Its probably not the fastest way to get the job done, but for me it worked sufficiently fast so far. For other datatypes see the documentation
Upvotes: 1