andrew
andrew

Reputation: 41

Tensorflow: How to copy conv layer weights to another variable for use in reinforcement learning?

I'm not sure if this is possible in Tensorflow and I'm concerned I may have to switch over to PyTorch.

Basically, I have this layer:

self.policy_conv1 = tf.layers.conv2d(inputs=self.policy_s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)

Which I'm trying to copy into another layer every 100 iterations of training or so:

self.eval_conv1 = tf.layers.conv2d(inputs=self.s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid', activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)

tf.assign doesn't seem to be the right tool, and the following doesn't seem to work:

self.policy_conv1 = tf.stop_gradient(tf.identity(self.eval_conv1))

Essentially, I am looking to copy over the eval conv layer into the policy conv layer, and not have them tied together each time the graph runs one variable or the other (which is occurring with the identity snippet above). If someone can point me to the needed code, I would appreciate it.

Upvotes: 4

Views: 2848

Answers (1)

squadrick
squadrick

Reputation: 770

import numpy as np
import tensorflow as tf

# I'm using placeholders, but it'll work for other inputs as well
ph1 = tf.placeholder(tf.float32, [None, 32, 32, 3])
ph2 = tf.placeholder(tf.float32, [None, 32, 32, 3])

l1 = tf.layers.conv2d(inputs=ph1, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_1")
l2 = tf.layers.conv2d(inputs=ph2, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_2")

sess = tf.Session()
sess.run(tf.global_variables_initializer())

w1 = tf.get_default_graph().get_tensor_by_name("layer_1/kernel:0")
w2 = tf.get_default_graph().get_tensor_by_name("layer_2/kernel:0")

w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero

sess.run(tf.assign(w2, w1))
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # 0

w1 = w1 * 2 + 1
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero

layer_1/bias:0 should work for getting the bias terms.

UPDATE:

I found an easier way:

update_weights = [tf.assign(new, old) for (new, old) in 
   zip(tf.trainable_variables('new_scope'), tf.trainable_vars('old_scope'))]

Doing a sess.run on update_weights should copy the weights from one network to the other. Just remember to build them under separate name scopes.

Upvotes: 7

Related Questions