elizabeth
elizabeth

Reputation: 1

Evaluating DQN, Vehicle Routing Problem (VRP)

I am running this DQN algorithm that is trying to minimize the total distance traveled by a vehicle (VRP). In the training, as you can see in the images, everything works fine. The loss is decreasing, the average length in decreasing, and the reward is increasing.

However, in the evaluation phase the model behaves in an unexpected way. I am running 100 evaluation iterations. In the first run, the results are good. But, the next runs of evaluation give me sometimes good results and sometimes very bad results. In the good results I get min total distance (min length) value of 4, but sometimes the evaluation return a min value of 13 even though the evaluation is done on the same trained model.

So my question is this a normal behavior? And is there a way to improve these evaluation results?

P.S:

Here's an example of the evaluation output: shortest avg length found: 5.406301895156503 (this is the value from the training) Now here are an example of 2 solutions from evaluation

Solution 1:

[0, 1, 9, 4, 2, 3, 5, 0, 6, 7, 8, 10]
length 4.955087028443813

Solution 2:

[0, 4, 9, 3, 13, 0, 7, 13, 0, 10, 0, 6, 11, 5, 12, 1, 12, 0, 2, 12, 0, 8, 0]
length 10.15813521668315

The first 100 evaluations are similar to solution 1, and i rerun evaluation for another 100 i get results similar to solution 2.

Upvotes: 0

Views: 157

Answers (1)

ndrwnaguib
ndrwnaguib

Reputation: 6135

Adding the source code would definitely be helpful. There could be several reasons:

  1. Do you shuffle the training data?
  2. How is the reward function designed? Is it a function of the duality gap?
  3. Is cuda configured to be deterministic?
  4. Do you put your model to eval mode before the evaluation step?
  5. What is the density of the unexpected behavior results across all evaluation iteraitons? Perhaps the model only needs more & longer episodes or that your model is overfitting.
  6. How is the training & test data are split? The geometry of the VRP or TSP instances could have an impact.

Upvotes: 0

Related Questions