Allen Qin
Allen Qin

Reputation: 19957

Tensorflow: Memory Error while trying to load a numpy sparse matrix to input_fn

I'm building a text classification model and built a large sparse matrix with the shape (81062,100000).

The input_fn function is defined as:

# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'tfidf': X_train_tfidf.todense()}, y=y_train.values,
    batch_size=batch_size, num_epochs=None, shuffle=True)

When I tried to execute it, it gives me the following error:

MemoryError                               Traceback (most recent call last)

I then tried to build an input_fn using the data.Dataset module:

def input_fn():
    dataset = tf.contrib.data.Dataset.from_sparse_tensor_slices((X_train_tfidf, y_train.values))
    dataset = dataset.repeat().shuffle(buff).batch(batch_size)
    x, y = dataset.make_one_shot_iterator().get_next()
    return x, y

However, it gives me the following message:

TypeError: `sparse_tensor` must be a `tf.SparseTensor` object.

Basically what I want to do is to feed the training data in smaller batches to a neural network using SGD from a numpy sparse matrix. But I can't find the correct way to do it.

Can someone please help?

Upvotes: 0

Views: 172

Answers (1)

Akshay Agrawal
Akshay Agrawal

Reputation: 922

The TypeError is indicative of the fact that from_sparse_tensor_slices requires its input to be an instance of tf.SparseTensor. See: https://www.tensorflow.org/api_docs/python/tf/contrib/data/Dataset#from_sparse_tensor_slices.

Packing your training matrix along with your labels into a single SparseTensor should solve the problem.

Upvotes: 1

Related Questions