Jason
Jason

Reputation: 33

Saving adversarial samples into images and loading it back, but it fails attack

I am testing the adversarial sample attack using deepfool and sparsefool on mnist dataset. It did an attack on the preprocessed image data. However, when I save it into an image and then load it back, it fails attack.

I have test it using sparsefool and deepfool, and I think there are some precision problems when I save it into images. But I cannot figure it out how to implement it correctly.

if __name__ == "__main__":
# pic_path = 'testSample/img_13.jpg'
pic_path = "./hacked.jpg"
model_file = './trained/'

image = Image.open(pic_path)
image_array = np.array(image)
# print(np.shape(image_array)) # 28*28

shape = (28, 28, 1)
projection = (0, 1)
image_norm = tf.cast(image_array / 255.0 - 0.5, tf.float32)
image_norm = np.reshape(image_norm, shape)  # 28*28*1
image_norm = image_norm[tf.newaxis, ...]  # 1*28*28*1

model = tf.saved_model.load(model_file)

print(np.argmax(model(image_norm)), "nnn")

# fool_img, r, pred_label, fool_label, loops = SparseFool(
#     image_norm, projection, model)

print("pred_label", pred_label)
print("fool_label", np.argmax(model(fool_img)))

pert_image = np.reshape(fool_img, (28, 28))
# print(pert_image)

pert_image = np.copy(pert_image)
# np.savetxt("pert_image.txt", (pert_image + 0.5) * 255)
pert_image += 0.5
pert_image *= 255.

# shape = (28, 28, 1)
# projection = (0, 1)
# pert_image = tf.cast(((pert_image - 0.5) / 255.), tf.float32)
# image_norm = np.reshape(pert_image, shape)  # 28*28*1
# image_norm = image_norm[tf.newaxis, ...]  # 1*28*28*1
# print(np.argmax(model(image_norm)), "ffffnnn")

png = Image.fromarray(pert_image.astype(np.uint8))
png.save("./hacked.jpg")

It should attack 4 to 9, however, the saved image is still predicted into 4.

The full code project is shared on https://drive.google.com/open?id=132_SosfQAET3c4FQ2I1RS3wXsT_4W5Mw

Upvotes: 0

Views: 249

Answers (1)

Kishor datta gupta
Kishor datta gupta

Reputation: 1103

Based on my research and also this paper as reference https://arxiv.org/abs/1607.02533 You can see in real life when you converted to images, all of the adversarial attack samples generated from attack will not work on in real world. it can explain as below "This could be explained by the fact that iterative methods exploit more subtle kind of perturbations, and these subtle perturbations are more likely to be destroyed by photo transformation"

As example, your clean image has 127,200,55,..... you dividing into 255 (as it is 8bit png) and sending to you ML as (0.4980,0.78431,0.2156,...) . And deepfool is advanced attack method it added small perturb and change it to (0.4981,0.7841,0.2155...). Now this is adversarial sample which can fool your ML. but if you try to save it to 8bit png you will get again 127,200,55.. as you will multiply it by 255. So adversarial information is lost.

Simple put, you use deep fool method it added some perturb so small which essential not possible in real world 8bit png.

Upvotes: 2

Related Questions