Reputation: 9407
I'm studying how to break linear classifiers, but I'm having trouble understanding tf.gradients
.
The point of the project is to take a model and train it on the mnist
dataset. Once it is trained, I am taking an image, slightly changing it, and feed it back to the model. However, when I feed it back, the prediction should be different. For example, if I have an image of a 2 and I want the model to predict a 6, I will change the image slightly so that the image still looks like a 2 but the model will think its a 6.
How this is done is a simple equation. We take the derivative of the loss function and take the sign of it and apply it to the image multiplied by some epsilon value. For example, the equation is something like this...
new image = image + (epsilon * sign of derivative of loss function)
The part that confuses me is tf.gradients
. I am looking at an example but I am having a hard time understanding it.
First, 10 images of a number 2 are extracted. Next, 10 labels are created representing the label 6. So the labels looks as follows...
[[0, 0, 0, 0, 0, 1, 0, 0, 0 ,0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
...etc...
And then to the derivative of the cost function looks as so (cross_entropy
is the cost function)...
im_derivative = tf.gradients(cross_entropy, x)[0]
im_derivative = im_derivative.eval({x: x0,
y_: y_six,
keep_prob: 1.0})
x0
is are the 10 images of a 2 and y_six
are the labels representing the number 6. The sign of this derivative is then used in the equation I demonstrated above.
My question is this, what exactly is the tf.gradients
returning and why is the derivative being evaluated with a label of 6 rather than a label of 2? I'm having a hard time understanding what is being returned and why a fake label is being used. I understand that a fake label is probably necessary to trick the classifier but it is hard to see this because I don't understand what tf.gradients
is returning.
Upvotes: 0
Views: 187
Reputation: 27042
tf.gradient(ys, xs)
is returning the symbolic partial derivatives of sum of ys w.r.t. x in xs.
In your case, you're defining the partial derivative of cross_entropy
with respect to x
(and extracting the first (and only) element, since tf.gradient
returns a list).
The gradient of the cost with respect to the input gives you an indication of how much you have to update your network parameters and in which direction perform this update in order to minimize the cost.
Hence, since you want to trick the classifier you compute the gradient of a certain input with a different label, in order to find the "indication" (or signal) you have to follow in order to make the network consider that input a 6
.
Upvotes: 1