Lau
Lau

Reputation: 3650

Tensorflow splitting training data to batches

I have a dataset of images as a Numpy array. (Number of images, length, width, colour range) I would like to split it to batches and feed to tensorflow. What is the good way to do it?

Upvotes: 1

Views: 12248

Answers (4)

Ege
Ege

Reputation: 543

If you already have created your dataset you can just use batch() to create batches of the data.

>>>dataset = tf.data.Dataset.range(8)
>>>dataset = dataset.batch(3)
>>>list(dataset.as_numpy_iterator())

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

You can see more details in tensorflow documentation about batch()

Upvotes: 0

user2653663
user2653663

Reputation: 2948

There's a small error in Thomas Pinetz answer and I can't make comments yet, so here's an extra answer.

int(len(array)/batch_size) will round the division down to the nearest integer, so the last batch wouldn't be processed. To round up the division you can use

ceil_int = -(-a//b)

In addition you might end up with the last batch being very tiny compared to the rest. You can modify your batch size slightly to make this less likely to happen. The complete code is shown below:

def ceil(a,b):
    return -(-a//b)

n_samples = len(array)
better_batch_size = ceil(n_samples, ceil(n_samples, batch_size))

for i in range(ceil(n_samples, better_batch_size)):
    batch = array[i * better_batch_size: (i+1) * better_batch_size]

Upvotes: 2

Thomas Pinetz
Thomas Pinetz

Reputation: 7148

I use something like this:

for bid in range(int(len(array)/batch_size)):
    batch = array[bid*batch_size:(bid+1)*batch_size]

Upvotes: 0

user3813674
user3813674

Reputation: 2673

First you could use numpy.split to divide your images into batches (sub-ndarrays). Then you could feed them to the tf.Session using the run function with the feed_dict parameter.

I'd also highly recommend looking at the TF MNIST tutorial

Upvotes: 3

Related Questions