dxy159
dxy159

Reputation: 1

Machine Learning (Adversarial Images)

In case any of you don't know, adversarial images are images that belong to a certain class, but then are distorted without any visually perceptive difference to the human eye, but the network misunderstandingly recognizes it in a completely different class.

More information about it here: http://karpathy.github.io/2015/03/30/breaking-convnets/

Using TensorFlow, I have learned a lot about convolutional neural networks.

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)


W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)


W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)


W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

The challenge is to input an image of the number 2 also labelled as '2', and somehow convolving this image so that the output would identify it as '6', changing the pixels so slightly that the difference is unrecognizable.

Anyone have any idea where to start with this?

Upvotes: 0

Views: 322

Answers (1)

Robert Lacok
Robert Lacok

Reputation: 4334

You can start by reading this paper: https://arxiv.org/abs/1412.6572 (for example)

It explains one of the ways to generate adversarial examples by computing gradients of the loss function with respect to inputs.

Have a look at tf.gradients()

Once you have defined your loss function, which is for example cross entropy, you do something like:

grads = tf.gradients(loss, [x])[0]
signs = tf.sign(grads)
epsilon = tf.constant(0.25)
x_adversarial = tf.add(tf.multiply(epsilon, signs), x)

x_adversarial will be your sneaky image. You can play with the epsilon value, which sets the magnitude of the added noise.

Upvotes: 1

Related Questions