Reputation: 275
I have loaded all the training data in the memory, which only consumes 7% of the total memory. And the following framework is used to train the model:
# build graph
......
# data producer
class DataProducer(object):
# a single feature has multiple labels and is needed to be trained separately for each label
# in order to not copy the features multiple times, I use the self.ft_idxs to index the relationships between features and labels
def yield_trn_batch(self, batch_size):
for i in xrange(0, self.num_data, batch_size):
fts = self.fts[self.ft_idxs[self.shuffled_idxs[i: i+batch_size]]
labels = self.labels[self.shuffled_idxs[i: i+batch_size]]
yield fts, labels
# training
for feature, label in data.yield_trn_batch(batch_size):
sess.run(model.train_op, feed_dict={model.feature: feature, model.label: label})
However, the training process randomly slows down when the dimensionality of the feature is high. The diagnoses are as following:
I guess the graph running process have affected the data loading but don't know why and how to solve this (maybe using another thread to load data?). Can anybody fix the problem?
Upvotes: 0
Views: 863
Reputation: 126154
Based on our conversation in the comments, it appears that the slow down is due to memory pressure caused by allocating a large number of NumPy arrays. Although the NumPy arrays are properly garbage collected properly when they are no longer used, the default malloc()
implementation will not reuse them, and gradually increase the size of the heap (and the virtual size of the process), by calling the brk()
system call.
One workaround is to switch the allocator library, which can fix address space leaks: use the tcmalloc
allocator instead of the default malloc()
for your TensorFlow process. The allocation policy in tcmalloc
is better suited to allocating and recycling buffers of the same size repeatedly, and it will not need to increase the size of the head over time, which should lead to better performance.
Upvotes: 1