Donbeo
Donbeo

Reputation: 17617

TensorFlow create dataset from numpy array

TensorFlow as build it a nice way to store data. This is for example used to store the MNIST data in the example:

>>> mnist
<tensorflow.examples.tutorials.mnist.input_data.read_data_sets.<locals>.DataSets object at 0x10f930630>

Suppose to have a input and output numpy arrays.

>>> x = np.random.normal(0,1, (100, 10))
>>> y = np.random.randint(0, 2, 100)

How can I transform them in a tf dataset?

I want to use functions like next_batch

Upvotes: 21

Views: 35546

Answers (3)

MajidL
MajidL

Reputation: 731

Recently, Tensorflow added a feature to its dataset api to consume numpy array. See here for details.

Here is the snippet that I copied from there:

# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
  features = data["features"]
  labels = data["labels"]

# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]

features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)

dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# [Other transformations on `dataset`...]
dataset = ...
iterator = dataset.make_initializable_iterator()

sess.run(iterator.initializer, feed_dict={features_placeholder: features,
                                          labels_placeholder: labels})

Upvotes: 3

WenFeng Luo
WenFeng Luo

Reputation: 1

As a alternative, you may use the function tf.train.batch() to create a batch of your data and at the same time eliminate the use of tf.placeholder. Refer to the documentation for more details.

>>> images = tf.constant(X, dtype=tf.float32) # X is a np.array
>>> labels = tf.constant(y, dtype=tf.int32)   # y is a np.array
>>> batch_images, batch_labels = tf.train.batch([images, labels], batch_size=32, capacity=300, enqueue_many=True)

Upvotes: 0

Ian Goodfellow
Ian Goodfellow

Reputation: 2604

The Dataset object is only part of the MNIST tutorial, not the main TensorFlow library.

You can see where it is defined here:

GitHub Link

The constructor accepts an images and labels argument so presumably you can pass your own values there.

Upvotes: 9

Related Questions