Tensorflow loading data randomly slows down during training

Question

I have loaded all the training data in the memory, which only consumes 7% of the total memory. And the following framework is used to train the model:

# build graph
......
# data producer
class DataProducer(object):
  # a single feature has multiple labels and is needed to be trained separately for each label
  # in order to not copy the features multiple times, I use the self.ft_idxs to index the relationships between features and labels
  def yield_trn_batch(self, batch_size):
    for i in xrange(0, self.num_data, batch_size):
      fts = self.fts[self.ft_idxs[self.shuffled_idxs[i: i+batch_size]]
      labels = self.labels[self.shuffled_idxs[i: i+batch_size]]
      yield fts, labels

# training
for feature, label in data.yield_trn_batch(batch_size):
  sess.run(model.train_op, feed_dict={model.feature: feature, model.label: label})

However, the training process randomly slows down when the dimensionality of the feature is high. The diagnoses are as following:

The graph is predefined and is not the case in tensorflow slow performance.
The actual running time for sess.run() is stable and the following is the timeline of a training batch, which seems normal.
The slow part happened in data.yield_trn_batch(). At the begining, it took 0.01s to load one minibatch, but after several epoches it became unstable and sometimes took over 1s to load a minibatch. However, when I commented the sess.run() and purely run the data.yield_trn_batch(), it was fast as normal. I don't use the queue so it might not be the situation in dequeue many operation very slow.

I guess the graph running process have affected the data loading but don't know why and how to solve this (maybe using another thread to load data?). Can anybody fix the problem?

Tensorflow loading data randomly slows down during training

Answers (1)

Related Questions