Kaustabh Kakoty
Kaustabh Kakoty

Reputation: 137

How to implement next_batch() function for custom data in python

I am currently working on the cats vs dogs classification task on kaggle by implementing a deep convNet. The following lines of code is used for data preprocessing:

def label_img(img):
   word_label = img.split('.')[-3]
   if word_label == 'cat': return [1,0]
   elif word_label == 'dog': return [0,1]

def create_train_data():
   training_data = []
   for img in tqdm(os.listdir(TRAIN_DIR)):
      label = label_img(img)
      path = os.path.join(TRAIN_DIR,img)
      img = cv2.resize(cv2.imread(path,cv2.IMREAD_GRAYSCALE),IMG_SIZE,IMG_SIZE))
      training_data.append([np.array(img),np.array(label)])

   shuffle(training_data)
   return training_data

train_data = create_train_data()

X_train = np.array([i[0] for i in train_data]).reshape(-1, IMG_SIZE,IMG_SIZE,1)
Y_train =np.asarray([i[1] for i in train_data])

I want to implement a function that replicates the following function provided in the tensorflow deep MNIST tutorial

batch = mnist.train.next_batch(100)

Upvotes: 0

Views: 3526

Answers (2)

Joshua Lim
Joshua Lim

Reputation: 345

Apart from generating a batch, you may also want to randomly re-arrange data for each batch.

EPOCH = 100
BATCH_SIZE = 128
TRAIN_DATASIZE,_,_,_ = X_train.shape
PERIOD = TRAIN_DATASIZE/BATCH_SIZE #Number of iterations for each epoch

for e in range(EPOCH):
    idxs = numpy.random.permutation(TRAIN_DATASIZE) #shuffled ordering
    X_random = X_train[idxs]
    Y_random = Y_train[idxs]
    for i in range(PERIOD):
        batch_X = X_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        batch_Y = Y_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        sess.run(train,feed_dict = {X: batch_X, Y:batch_Y})

Upvotes: 3

Satoshi Kataoka
Satoshi Kataoka

Reputation: 316

This code is a good example to come up with the function to generate batch.

To explain briefly, you just need to come up with two arrays for x_train and y_train like:

  batch_inputs = np.ndarray(shape=(batch_size), dtype=np.int32)
  batch_labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)

And set train data like:

  batch_inpouts[i] = ...
  batch_labels[i, 0] = ...

Finally pass the data set to session:

_, loss_val = session.run([optimizer, loss], feed_dict={train_inputs: batch_inputs, train_labels:batch_labels})

Upvotes: 0

Related Questions