Anand Chandra
Anand Chandra

Reputation: 33

feeding large numpy arrays to tensorflow

I have a large numpy arrays (X) which I can load onto the CPU but it is too big for the GPU/Tensorflow.I would like to perform array operations on X using tensorflow so I break up the array into batches (using numpy), feed it to tensorflow, and then finally concatenate the final output arrays to give me the numpy array Y. I am new to tensorflow so I think there should be a better/faster way to feed in the numpy array.

#X is a large numpy array
#batches is an integer which defines the number of batches

X_list = np.array_split(X,batches)

X_tf = tf.placeholder(tf.float32)
Y_tf = some_function(X_tf)

for batch in range(batches):
    sess = tf.Session()
    sess.run(init)
    Y_list.append(sess.run(Y_tf, feed_dict={X_tf: X_list[batch]}))
    sess.close()

Y = np.hstack(Y_list)

Upvotes: 2

Views: 1919

Answers (2)

nugrinovic
nugrinovic

Reputation: 75

The placeholder method reduces greatly the speed that the data is feed into the system, even up to 32%. For a more detailed explanation of this I recommend reading this great course notes: lecture 03. The ideal would be not to use placeholders, however, because your dataset is large, you can run into the

2GB limit for the tf.GraphDef protocol buffer. here

Upvotes: 0

zephyrus
zephyrus

Reputation: 1266

You should look at the tensorflow dataset class, as it has capability of handling large np arrays. As long as the array can fit in memory, it can be loaded and batched however you want.

A basic implementation would look like (more detail here)

#load np array X 

#make placeholders for dataset    
X_placeholder = tf.placeholder(dtype=tf.float32, shape=X.shape)    

#make data set from placeholders    
dataset = Dataset.from_tensor_slices((X_placeholder)) 

#batch
dataset = dataset.batch(batch_size)  

Upvotes: 2

Related Questions