Reputation: 1
In case any of you don't know, adversarial images are images that belong to a certain class, but then are distorted without any visually perceptive difference to the human eye, but the network misunderstandingly recognizes it in a completely different class.
More information about it here: http://karpathy.github.io/2015/03/30/breaking-convnets/
Using TensorFlow, I have learned a lot about convolutional neural networks.
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
The challenge is to input an image of the number 2 also labelled as '2', and somehow convolving this image so that the output would identify it as '6', changing the pixels so slightly that the difference is unrecognizable.
Anyone have any idea where to start with this?
Upvotes: 0
Views: 322
Reputation: 4334
You can start by reading this paper: https://arxiv.org/abs/1412.6572 (for example)
It explains one of the ways to generate adversarial examples by computing gradients of the loss function with respect to inputs.
Have a look at tf.gradients()
Once you have defined your loss function, which is for example cross entropy, you do something like:
grads = tf.gradients(loss, [x])[0]
signs = tf.sign(grads)
epsilon = tf.constant(0.25)
x_adversarial = tf.add(tf.multiply(epsilon, signs), x)
x_adversarial
will be your sneaky image. You can play with the epsilon
value, which sets the magnitude of the added noise.
Upvotes: 1