Reputation: 19957
I'm building a text classification model and built a large sparse matrix with the shape (81062,100000).
The input_fn function is defined as:
# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
x={'tfidf': X_train_tfidf.todense()}, y=y_train.values,
batch_size=batch_size, num_epochs=None, shuffle=True)
When I tried to execute it, it gives me the following error:
MemoryError Traceback (most recent call last)
I then tried to build an input_fn using the data.Dataset module:
def input_fn():
dataset = tf.contrib.data.Dataset.from_sparse_tensor_slices((X_train_tfidf, y_train.values))
dataset = dataset.repeat().shuffle(buff).batch(batch_size)
x, y = dataset.make_one_shot_iterator().get_next()
return x, y
However, it gives me the following message:
TypeError: `sparse_tensor` must be a `tf.SparseTensor` object.
Basically what I want to do is to feed the training data in smaller batches to a neural network using SGD from a numpy sparse matrix. But I can't find the correct way to do it.
Can someone please help?
Upvotes: 0
Views: 172
Reputation: 922
The TypeError is indicative of the fact that from_sparse_tensor_slices requires its input to be an instance of tf.SparseTensor. See: https://www.tensorflow.org/api_docs/python/tf/contrib/data/Dataset#from_sparse_tensor_slices.
Packing your training matrix along with your labels into a single SparseTensor should solve the problem.
Upvotes: 1