null
null

Reputation: 1217

Tensorflow, Keras: How to create a trainable variable that only update in specific positions?

For example, y=Ax

where A is an diagonal matrix, with its trainable weights (w1, w2, w3) on the diagonal.

A = [w1 ... ...
    ...  w2 ...
    ... ... w3]

How to create such trainable A in Tensorflow or Keras?

If I try A = tf.Variable(np.eye(3)), the total number of trainable weights would be 3*3=9, not 3. Because I only want to update (w1,w2,w3) that 3 weights.

A trick may be to use A = tf.Variable([1, 1, 1]) * np.eye(3), so that the 3 trainable weights are mapped into the diagonal of A.

My question is:

  1. Would that trick work for my purpose? Would the gradient be correctly calculated?

  2. What if the situation of A is more complicated? E.g. if I want to create:

More complex Example

where the w1, w2, ..., w6 are weights to be updated.

Upvotes: 6

Views: 1882

Answers (3)

dsalaj
dsalaj

Reputation: 3207

For a more complex case where A needs to be divided in sections where only some parts are trainable and others can have arbitrary values, the easiest thing to do would be to build the individual sections and then concatenate them together.

For example I needed weight matrix A of arbitrary size that (for size 4x4) looks like this (4 distinct sections of 2x2):

#  [[0.,   0.,   -0.2,    0.],
#   [0.,   0.,   0.,      -0.2],
#   [0.35, 0.,   train,   train],
#   [0.,   0.35, train,   train]]

Code to make this:

n_neurons = 3
zero_quarter = tf.zeros((n_neurons, n_neurons))  # upper left quarter are zeros
neg_diag = tf.diag(tf.ones(n_neurons) * -0.2)  # upper right is negative diag
pos_diag = tf.diag(tf.ones(n_neurons) * 0.35)  # lower left is positive diag
# lower right quarter is trainable randomly initialized vars
train_quarter = tf.get_variable(name='TrainableWeights', shape=[n_neurons, n_neurons])

weights_row0 = tf.concat([zero_quarter, neg_diag], axis=1)
weights_row1 = tf.concat([pos_diag, train_quarter], axis=1)

weights = tf.concat([weights_row0, weights_row1], axis=0)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run(weights))

And the result is:

[[ 0.          0.          0.         -0.2         0.          0.        ]
 [ 0.          0.          0.          0.         -0.2         0.        ]
 [ 0.          0.          0.          0.          0.         -0.2       ]
 [ 0.35        0.          0.         -0.61401606  0.39812732  0.72078323]
 [ 0.          0.35        0.         -0.34560132  0.40494204  0.36660933]
 [ 0.          0.          0.35        0.34820676  0.5112138  -0.97605824]]

where only the bottom right 3x3 section is trainable.

Upvotes: 1

P-Gn
P-Gn

Reputation: 24581

You have two different tools to address this problem.

  1. You can create the variables you need and rearrange them into the desired form.
  2. You can create more variables than you need then discard some to reach the desired form.

Both approach are not exclusive and you could you a mix of successives steps of type #1 and #2.

For example, for your first example (diagonal matrix), we can use approach #1.

w = tf.Variable(tf.zeros(n))
A = tf.diag(w) # creates a diagonal matrix with elements of w

For your second, more complex example, we could use approach #2.

A = tf.Variable(tf.zeros((n, n)))
A = tf.matrix_band_part(A, 1, 1) # keep only the central band of width 3
A = tf.matrix_set_diag(A, tf.ones(n)) # set diagonal to 1

Upvotes: 4

Jie.Zhou
Jie.Zhou

Reputation: 1318

Both creating a variable of vector or matrix works fine

for question 1.

Don't worry, gradients will be calculated correctly

for question 2.

If it becomes more complex, like your mentioned, you can still create a variable of vector and then build matrix from that variable.

Alternatively, you can create a variable of matrix, and then update only part of them with tf.scatter_update instead of tf.assign

Upvotes: 0

Related Questions