Deshwal
Deshwal

Reputation: 4152

Calculate losses and computing gradients for multiple layers at once in tensorflow with tf.GradientTape()

If my knowledge of layers is correct, then Layers use tf.Variable as weight variable so if a Dense() layer has 3 units in it, it means it is using something like w = tf.Variable([0.2,5,0.9]) for a single instance and if the input_shape is 2 there are variable would be something like w = tf.Variable([[0.2,5,0.9],[2,3,0.4]])? Please correct me if I am wrong.

I am learning the very deep basics of tensorflow and found a code that I modified as

weight = tf.Variable([3.2]) 


def get_lost_loss(w):
    '''
    A very hypothetical function since the name
    '''
    return (w**1.3)/3.1 # just felt like doing it


def calculate_gradient(w):
    with tf.GradientTape() as tape:
        loss = get_lost_loss(w) # calculate loss WITHIN tf.GradientTape()
        
    grad = tape.gradient(loss,w) # gradient of loss wrt. w
    
    return grad


# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)

losses = []

for i in range(50):
    grad = calculate_gradient(weight)
    opt.apply_gradients(zip([grad],[weight]))
    
    losses.append(get_lost_loss(weight))

Could someone please give me an intuition of what is happening here inside tf.GradientTape(). Also the thing I wanted to ask the most is that if I have to do it for weight1 and weight2 whose shapes are [2,3] instead of weight, what should be the modification on the code

Please make any assumptions that are to be made. You all are far more skilled than me in this field.

Upvotes: 1

Views: 2118

Answers (1)

Roohollah Etemadi
Roohollah Etemadi

Reputation: 1393

Yes, you are right. Layers has two variables. The one you mentioned is called kernel. And the other one is called bias. The example below explains it in details:

import tensorflow as tf
w=tf.Variable([[3.2,5,6,7,5]],dtype=tf.float32)

d=tf.keras.layers.Dense(3,input_shape=(5,)) # Layer d gets inputs with shape (*,5) and generates outputs with shape (*,3)
                                            # It has kernel variable with shape (5,3) and bias variable with shape (3)
print("Output of applying d on w:", d(w))
print("\nLayer d trainable variables:\n", d.trainable_weights)

The output will be something like:

Output of applying d on w: tf.Tensor([[ -0.9845681 -10.321521    7.506028 ]], shape=(1, 3), dtype=float32)



Layer d trainable variables:
 [<tf.Variable 'dense_18/kernel:0' shape=(5, 3) dtype=float32, numpy=
array([[-0.8144073 , -0.8408185 , -0.2504158 ],
       [ 0.6073988 ,  0.09965736, -0.32579994],
       [ 0.04219657, -0.33530533,  0.71029276],
       [ 0.33406   , -0.673926  ,  0.77048916],
       [-0.8014116 , -0.27997494,  0.05623555]], dtype=float32)>, <tf.Variable 'dense_18/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

tf.GradientTape() is used to record the operations on the trainable weights (variables) in its context for automatic differentiation. So later we can get the derivative of the variables.

Suppose you have two weight variables as weight1 and weight2. First you need to change you loss function to use both variables (see the code below). Then in each step you need to get the derivative of loss function wrt. the variables and update them to optimize the loss. Please see the code below.

import tensorflow as tf

weight1 = tf.Variable([[3.2,5,6],[2,5,4]],dtype=tf.float32) #modified
weight2= tf.Variable([[1,2,3],[1,4,3]],dtype=tf.float32)    #modified

def get_lost_loss(w1, w2): #modified
    '''
    A very hypothetical function since the name
    '''
    return tf.reduce_sum(tf.math.add(w1**1.2/2,w2**2))  # just felt like doing it


def calculate_gradient(w1,w2):
    with tf.GradientTape() as tape:
        loss = get_lost_loss(w1,w2) # calculate loss WITHIN tf.GradientTape()
        
    dw1,dw2 = tape.gradient(loss,[w1,w2]) # gradient of loss wrt. w1,w2
    
    return dw1,dw2


# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)

losses = []

for i in range(500):
    grad_weight1, grad_weight2 = calculate_gradient(weight1,weight2)
    opt.apply_gradients(zip([grad_weight1, grad_weight2],[weight1,weight2]))
    
    losses.append(get_lost_loss(weight1,weight2))
    print("loss: "+str(get_lost_loss(weight1,weight2).numpy()))

Upvotes: 1

Related Questions