Saving adversarial samples into images and loading it back, but it fails attack

Question

I am testing the adversarial sample attack using deepfool and sparsefool on mnist dataset. It did an attack on the preprocessed image data. However, when I save it into an image and then load it back, it fails attack.

I have test it using sparsefool and deepfool, and I think there are some precision problems when I save it into images. But I cannot figure it out how to implement it correctly.

if __name__ == "__main__":
# pic_path = 'testSample/img_13.jpg'
pic_path = "./hacked.jpg"
model_file = './trained/'

image = Image.open(pic_path)
image_array = np.array(image)
# print(np.shape(image_array)) # 28*28

shape = (28, 28, 1)
projection = (0, 1)
image_norm = tf.cast(image_array / 255.0 - 0.5, tf.float32)
image_norm = np.reshape(image_norm, shape)  # 28*28*1
image_norm = image_norm[tf.newaxis, ...]  # 1*28*28*1

model = tf.saved_model.load(model_file)

print(np.argmax(model(image_norm)), "nnn")

# fool_img, r, pred_label, fool_label, loops = SparseFool(
#     image_norm, projection, model)

print("pred_label", pred_label)
print("fool_label", np.argmax(model(fool_img)))

pert_image = np.reshape(fool_img, (28, 28))
# print(pert_image)

pert_image = np.copy(pert_image)
# np.savetxt("pert_image.txt", (pert_image + 0.5) * 255)
pert_image += 0.5
pert_image *= 255.

# shape = (28, 28, 1)
# projection = (0, 1)
# pert_image = tf.cast(((pert_image - 0.5) / 255.), tf.float32)
# image_norm = np.reshape(pert_image, shape)  # 28*28*1
# image_norm = image_norm[tf.newaxis, ...]  # 1*28*28*1
# print(np.argmax(model(image_norm)), "ffffnnn")

png = Image.fromarray(pert_image.astype(np.uint8))
png.save("./hacked.jpg")

It should attack 4 to 9, however, the saved image is still predicted into 4.

The full code project is shared on https://drive.google.com/open?id=132_SosfQAET3c4FQ2I1RS3wXsT_4W5Mw

Saving adversarial samples into images and loading it back, but it fails attack

Answers (1)

Related Questions