Reputation: 1217
For example, y=Ax
where A
is an diagonal matrix, with its trainable weights (w1, w2, w3
) on the diagonal.
A = [w1 ... ...
... w2 ...
... ... w3]
How to create such trainable A
in Tensorflow or Keras?
If I try A = tf.Variable(np.eye(3))
, the total number of trainable weights would be 3*3=9, not 3. Because I only want to update (w1,w2,w3) that 3 weights.
A trick may be to use A = tf.Variable([1, 1, 1]) * np.eye(3)
, so that the 3 trainable weights are mapped into the diagonal of A
.
My question is:
Would that trick work for my purpose? Would the gradient be correctly calculated?
What if the situation of A
is more complicated? E.g. if I want to create:
where the w1, w2, ..., w6
are weights to be updated.
Upvotes: 6
Views: 1882
Reputation: 3207
For a more complex case where A
needs to be divided in sections where only some parts are trainable and others can have arbitrary values, the easiest thing to do would be to build the individual sections and then concatenate them together.
For example I needed weight matrix A
of arbitrary size that (for size 4x4) looks like this (4 distinct sections of 2x2):
# [[0., 0., -0.2, 0.],
# [0., 0., 0., -0.2],
# [0.35, 0., train, train],
# [0., 0.35, train, train]]
Code to make this:
n_neurons = 3
zero_quarter = tf.zeros((n_neurons, n_neurons)) # upper left quarter are zeros
neg_diag = tf.diag(tf.ones(n_neurons) * -0.2) # upper right is negative diag
pos_diag = tf.diag(tf.ones(n_neurons) * 0.35) # lower left is positive diag
# lower right quarter is trainable randomly initialized vars
train_quarter = tf.get_variable(name='TrainableWeights', shape=[n_neurons, n_neurons])
weights_row0 = tf.concat([zero_quarter, neg_diag], axis=1)
weights_row1 = tf.concat([pos_diag, train_quarter], axis=1)
weights = tf.concat([weights_row0, weights_row1], axis=0)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run(weights))
And the result is:
[[ 0. 0. 0. -0.2 0. 0. ]
[ 0. 0. 0. 0. -0.2 0. ]
[ 0. 0. 0. 0. 0. -0.2 ]
[ 0.35 0. 0. -0.61401606 0.39812732 0.72078323]
[ 0. 0.35 0. -0.34560132 0.40494204 0.36660933]
[ 0. 0. 0.35 0.34820676 0.5112138 -0.97605824]]
where only the bottom right 3x3 section is trainable.
Upvotes: 1
Reputation: 24581
You have two different tools to address this problem.
Both approach are not exclusive and you could you a mix of successives steps of type #1 and #2.
For example, for your first example (diagonal matrix), we can use approach #1.
w = tf.Variable(tf.zeros(n))
A = tf.diag(w) # creates a diagonal matrix with elements of w
For your second, more complex example, we could use approach #2.
A = tf.Variable(tf.zeros((n, n)))
A = tf.matrix_band_part(A, 1, 1) # keep only the central band of width 3
A = tf.matrix_set_diag(A, tf.ones(n)) # set diagonal to 1
Upvotes: 4
Reputation: 1318
Both creating a variable of vector or matrix works fine
for question 1.
Don't worry, gradients will be calculated correctly
for question 2.
If it becomes more complex, like your mentioned, you can still create a variable of vector and then build matrix from that variable.
Alternatively, you can create a variable of matrix, and then update only part of them with tf.scatter_update
instead of tf.assign
Upvotes: 0