Reputation: 3650
I have a dataset of images as a Numpy array. (Number of images, length, width, colour range) I would like to split it to batches and feed to tensorflow. What is the good way to do it?
Upvotes: 1
Views: 12248
Reputation: 543
If you already have created your dataset you can just use batch()
to create batches of the data.
>>>dataset = tf.data.Dataset.range(8)
>>>dataset = dataset.batch(3)
>>>list(dataset.as_numpy_iterator())
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]
You can see more details in tensorflow documentation about batch()
Upvotes: 0
Reputation: 2948
There's a small error in Thomas Pinetz answer and I can't make comments yet, so here's an extra answer.
int(len(array)/batch_size)
will round the division down to the nearest integer, so the last batch wouldn't be processed. To round up the division you can use
ceil_int = -(-a//b)
In addition you might end up with the last batch being very tiny compared to the rest. You can modify your batch size slightly to make this less likely to happen. The complete code is shown below:
def ceil(a,b):
return -(-a//b)
n_samples = len(array)
better_batch_size = ceil(n_samples, ceil(n_samples, batch_size))
for i in range(ceil(n_samples, better_batch_size)):
batch = array[i * better_batch_size: (i+1) * better_batch_size]
Upvotes: 2
Reputation: 7148
I use something like this:
for bid in range(int(len(array)/batch_size)):
batch = array[bid*batch_size:(bid+1)*batch_size]
Upvotes: 0
Reputation: 2673
First you could use numpy.split
to divide your images into batches (sub-ndarrays). Then you could feed them to the tf.Session
using the run
function with the feed_dict
parameter.
I'd also highly recommend looking at the TF MNIST tutorial
Upvotes: 3