Reputation: 4152
If my knowledge of layers is correct, then Layers
use tf.Variable
as weight variable so if a Dense()
layer has 3 units in it, it means it is using something like w = tf.Variable([0.2,5,0.9])
for a single instance and if the input_shape
is 2 there are variable would be something like w = tf.Variable([[0.2,5,0.9],[2,3,0.4]])
?
Please correct me if I am wrong.
I am learning the very deep basics of tensorflow and found a code that I modified as
weight = tf.Variable([3.2])
def get_lost_loss(w):
'''
A very hypothetical function since the name
'''
return (w**1.3)/3.1 # just felt like doing it
def calculate_gradient(w):
with tf.GradientTape() as tape:
loss = get_lost_loss(w) # calculate loss WITHIN tf.GradientTape()
grad = tape.gradient(loss,w) # gradient of loss wrt. w
return grad
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(50):
grad = calculate_gradient(weight)
opt.apply_gradients(zip([grad],[weight]))
losses.append(get_lost_loss(weight))
Could someone please give me an intuition of what is happening here inside tf.GradientTape()
. Also the thing I wanted to ask the most is that if I have to do it for weight1
and weight2
whose shapes are [2,3]
instead of weight
, what should be the modification on the code
Please make any assumptions that are to be made. You all are far more skilled than me in this field.
Upvotes: 1
Views: 2118
Reputation: 1393
Yes, you are right. Layers has two variables. The one you mentioned is called kernel. And the other one is called bias. The example below explains it in details:
import tensorflow as tf
w=tf.Variable([[3.2,5,6,7,5]],dtype=tf.float32)
d=tf.keras.layers.Dense(3,input_shape=(5,)) # Layer d gets inputs with shape (*,5) and generates outputs with shape (*,3)
# It has kernel variable with shape (5,3) and bias variable with shape (3)
print("Output of applying d on w:", d(w))
print("\nLayer d trainable variables:\n", d.trainable_weights)
The output will be something like:
Output of applying d on w: tf.Tensor([[ -0.9845681 -10.321521 7.506028 ]], shape=(1, 3), dtype=float32)
Layer d trainable variables:
[<tf.Variable 'dense_18/kernel:0' shape=(5, 3) dtype=float32, numpy=
array([[-0.8144073 , -0.8408185 , -0.2504158 ],
[ 0.6073988 , 0.09965736, -0.32579994],
[ 0.04219657, -0.33530533, 0.71029276],
[ 0.33406 , -0.673926 , 0.77048916],
[-0.8014116 , -0.27997494, 0.05623555]], dtype=float32)>, <tf.Variable 'dense_18/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]
tf.GradientTape() is used to record the operations on the trainable weights (variables) in its context for automatic differentiation. So later we can get the derivative of the variables.
Suppose you have two weight variables as weight1 and weight2. First you need to change you loss function to use both variables (see the code below). Then in each step you need to get the derivative of loss function wrt. the variables and update them to optimize the loss. Please see the code below.
import tensorflow as tf
weight1 = tf.Variable([[3.2,5,6],[2,5,4]],dtype=tf.float32) #modified
weight2= tf.Variable([[1,2,3],[1,4,3]],dtype=tf.float32) #modified
def get_lost_loss(w1, w2): #modified
'''
A very hypothetical function since the name
'''
return tf.reduce_sum(tf.math.add(w1**1.2/2,w2**2)) # just felt like doing it
def calculate_gradient(w1,w2):
with tf.GradientTape() as tape:
loss = get_lost_loss(w1,w2) # calculate loss WITHIN tf.GradientTape()
dw1,dw2 = tape.gradient(loss,[w1,w2]) # gradient of loss wrt. w1,w2
return dw1,dw2
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(500):
grad_weight1, grad_weight2 = calculate_gradient(weight1,weight2)
opt.apply_gradients(zip([grad_weight1, grad_weight2],[weight1,weight2]))
losses.append(get_lost_loss(weight1,weight2))
print("loss: "+str(get_lost_loss(weight1,weight2).numpy()))
Upvotes: 1