Reputation: 81
bce = tf.keras.losses.BinaryCrossentropy()
ll=bce(y_test[0], model.predict(X_test[0].reshape(1,-1)))
print(ll)
<tf.Tensor: shape=(), dtype=float32, numpy=0.04165391>
print(model.input)
<tf.Tensor 'dense_1_input:0' shape=(None, 195) dtype=float32>
model.output
<tf.Tensor 'dense_3/Sigmoid:0' shape=(None, 1) dtype=float32>
grads=K.gradients(ll, model.input)[0]
print(grads)
None
So here i have Trained a 2 hidden layer neural network, input has 195 features and output is 1 size. I wanted to feed the neural network with validation instances named as X_test one by one with their correct labels in y_test and for each instance calculate the gradients of the output with respect to input, the grads upon printing gives me a None. Your help is appreciated.
Upvotes: 5
Views: 1630
Reputation: 169
One can do this using tf.GradientTape. I wrote the following code to learn a sin wave, and get its derivative in the spirit of this question. I think, it should be possible to extend the following codes in order to compute partial derivatives.
Importing the needed libraries:
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
Create the data:
x = np.linspace(0, 6*np.pi, 2000)
y = np.sin(x)
Defining a Keras NN:
def model_gen(Input_shape):
X_input = Input(shape=Input_shape)
X = Dense(units=64, activation='sigmoid')(X_input)
X = Dense(units=64, activation='sigmoid')(X)
X = Dense(units=1)(X)
model = Model(inputs=X_input, outputs=X)
return model
Training the model:
model = model_gen(Input_shape=(1,))
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.001)
model.compile(loss=losses.mean_squared_error, optimizer=opt)
model.fit(x,y, epochs=200)
To obtain the gradient of the network w.r.t. the input:
x = list(x)
x = tf.constant(x)
with tf.GradientTape() as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
One can further visualise dy_dx to make sure of how smooth the derivative is. Finally, note that one get a smoother derivative when one uses a smooth activation (e.g. sigmoid) instead of Relu as noted here.
Upvotes: 5