user19826638
user19826638

Reputation: 61

Learning of DQN with noise data

I'm trying some experiments with DQN in a simple navigation task with binary rewards at the end of the episode. DQN is working perfectly well. Now I,m thinking of perturbing the reward, which means 10% of the time the binary reward is inverted. Will the TD update largely affect it? Is there any theoretical basis if the DQN can survive if such type adversarial attacks are done to the training data?

Upvotes: 0

Views: 74

Answers (1)

gehirndienst
gehirndienst

Reputation: 463

I don't see much sense in it. A reward signal is a sort of behavior control, usually you want to stabilize the behavior of your agent according to your expectations. You probably want to introduce some noise to the environment to add internal stochasticity to the process. In that case there is already a classic paper introducing NoisyDQNs. It is also a part of RAINBOW algorithms.

Upvotes: 0

Related Questions