stefano
stefano

Reputation: 369

Tensorflow: share value for two different variables within same operation

I have been experimenting with TensorFlow (TF) lately and I came across this problem: say I want to compute the value and the gradient of the function

f(x) = \sum_{ijk} x_i x_j x_k

where the x's are indexed differently but all refer to the same vector bold x and the J's are random constants (in physics this is a spin glass model). The gradient wrt x_k is then simply

grad_k(x) = sum_ij x_i*x_j

hence f sums over N^3 terms and gradf sums N times over N^2 terms. I have implemented f by generating all the terms of the sum as a rank 3 tensor and sum-reducing over all the entries. Then to differentiate I apply

tf.gradients(f, xk)[0]

where f is the loss function and xk a variable. Here's a MWE where assume all J's to be 1

import numpy as np
import tensorflow as tf

#first I define the variable                                                                                                                                                                  
n=10 #size of x                                                                                                                                                                               
x1 = tf.Variable(tf.zeros([n], dtype='float64'))
x2 = tf.placeholder(tf.float64, shape=[n])

#here I define the cost function                                                                                                                                                              
f_tensor = tf.mul(tf.mul(tf.reshape(x1, [n]),
                         tf.reshape(x2, [n,1])),
                  tf.reshape(x2, [n,1,1]))
f = tf.reduce_sum(f_tensor)

session = tf.Session()
init = tf.initialize_all_variables()
session.run(init)

#run on test array                                                                                                                                                                            
xtest = np.ones(n)
res = session.run([f, tf.gradients(f, x1)[0]],
                  feed_dict={x1 : xtest,
                             x2 : xtest})

assert res[0] == 1000
assert all(res[1] == np.array([100 for _ in xrange(n)]))

I need to call run many times independently and I want to reduce the number of variable assignments to just one since x1, x2 refer to the same vector.

Some profiling on a related example for n=200 (on a GeForce GTX 650) showed that

(results are similar for this mwe)

Hence assignment is the most expensive operation when performing the computation on GPUs. Obviously the overhead gets worse for increasing n, hence partially neutralising the benefit of using GPUs.

Any suggestion on how I could be able to do reduce overhead by transferring x only once?

Also any other suggestion on how to reduce any other overhead would be immensely appreciated.

EDIT

To show the problem in action I'll follow the suggestion by mrry. If I were to replace all instances of x2 with x1 then the MWE would look like this

#first I define the variable                                                                                                                                                                  
n=10 #size of x                                                                                                                                                                               
x1 = tf.Variable(tf.zeros([n], dtype='float64'))

#here I define the cost function                                                                                                                                                              
f_tensor = tf.mul(tf.mul(tf.reshape(x1, [n]),
                         tf.reshape(x1, [n,1])),
                  tf.reshape(x1, [n,1,1]))
f = tf.reduce_sum(f_tensor)

session = tf.Session()
init = tf.initialize_all_variables()
session.run(init)

#run on test array                                                                                                                                                                            
xtest = np.ones(n)
session.run(x1.assign(xtest))
res = session.run([f, tf.gradients(f, x1)[0]])

assert res[0] == 1000
for g in res[1]:
    assert g == 100

and the second assertion would fail because each entry for the gradient would be 300 instead of 100, as it should be. The reason is that while xi, xj, xk all refer to the same vector, they are symbolically distinct: replacing all x with the same variable would result in the derivative of x^3, which is 3*x^2, hence the result of the second MWE.

P.S. I have also explicitly assigned x1 for clarity

Upvotes: 3

Views: 664

Answers (2)

Pratik C
Pratik C

Reputation: 53

I couldn't comment above (not enough reputation), but note that the analytical gradient should be

$$ \frac{\partial f}{\partial x_k} = \sum_{ij} J_{ijk} x_i x_j + \sum_{ij} J_{ikj} x_i x_j + \sum_{ij} J_{kij} x_i x_j. $$

Upvotes: 1

mrry
mrry

Reputation: 126154

One way to achieve your desired outcome is to use the tf.stop_gradient() op to make an efficient copy of the variable x1 without it contributing to the gradient:

import numpy as np
import tensorflow as tf

# First define the variable.
n = 10 # size of x                                                                                                                                                                               
x1 = tf.Variable(tf.zeros([n], dtype=tf.float64))
x2 = tf.stop_gradient(x1)

# Now define the cost function                                                                                                                                                              
f_tensor = tf.mul(tf.mul(tf.reshape(x1, [n]),
                         tf.reshape(x2, [n,1])),
                  tf.reshape(x2, [n,1,1]))
f = tf.reduce_sum(f_tensor)

session = tf.Session()
init = tf.initialize_all_variables()
session.run(init)

# Run on test array                                                                                                                                                                            
xtest = np.ones(n)
res = session.run([f, tf.gradients(f, x1)[0]],
                  feed_dict={x1 : xtest})

assert res[0] == 1000
for g in res[1]:
    assert g == 100

Upvotes: 2

Related Questions