Reputation: 33
I have a large numpy arrays (X) which I can load onto the CPU but it is too big for the GPU/Tensorflow.I would like to perform array operations on X using tensorflow so I break up the array into batches (using numpy), feed it to tensorflow, and then finally concatenate the final output arrays to give me the numpy array Y. I am new to tensorflow so I think there should be a better/faster way to feed in the numpy array.
#X is a large numpy array
#batches is an integer which defines the number of batches
X_list = np.array_split(X,batches)
X_tf = tf.placeholder(tf.float32)
Y_tf = some_function(X_tf)
for batch in range(batches):
sess = tf.Session()
sess.run(init)
Y_list.append(sess.run(Y_tf, feed_dict={X_tf: X_list[batch]}))
sess.close()
Y = np.hstack(Y_list)
Upvotes: 2
Views: 1919
Reputation: 75
The placeholder method reduces greatly the speed that the data is feed into the system, even up to 32%. For a more detailed explanation of this I recommend reading this great course notes: lecture 03. The ideal would be not to use placeholders, however, because your dataset is large, you can run into the
2GB limit for the tf.GraphDef protocol buffer. here
Upvotes: 0
Reputation: 1266
You should look at the tensorflow dataset class, as it has capability of handling large np arrays. As long as the array can fit in memory, it can be loaded and batched however you want.
A basic implementation would look like (more detail here)
#load np array X
#make placeholders for dataset
X_placeholder = tf.placeholder(dtype=tf.float32, shape=X.shape)
#make data set from placeholders
dataset = Dataset.from_tensor_slices((X_placeholder))
#batch
dataset = dataset.batch(batch_size)
Upvotes: 2