Reputation: 4207
I am trying to load a tf-agents
policy I saved via
try:
PolicySaver(collect_policy).save(model_dir + 'collect_policy')
except TypeError:
tf.saved_model.save(collect_policy, model_dir + 'collect_policy')
Quick explanation for the try/except block: When originally creating the policy, I can save it via PolicySaver
, but when I load it again for another training run, it is a SavedModel
and can therefore not be saved by PolicySaver
.
This seems to work fine, but now I want to use this policy for self-play, so I load the policy with self.policy = tf.saved_model.load(policy_path)
in my AIPlayer class. When I try to use it for prediction, however, it does not work. Here's the (testing) code:
def decide(self, table):
state = table.getState()
timestep = ts.restart(np.array([table.getState()], dtype=np.float))
prediction = self.policy.action(timestep)
print(prediction)
the table
passed into the function contains the state of the game and the ts.restart()
function is copied from my custom pyEnvironment, so the timestep is constructed the exact same way as it would be in the environment. However, I get the following error message for the line prediction=self.policy.action(timestep)
:
ValueError: Could not find matching function to call loaded from the SavedModel. Got:
Positional arguments (2 total):
* TimeStep(step_type=<tf.Tensor 'time_step:0' shape=() dtype=int32>, reward=<tf.Tensor 'time_step_1:0' shape=() dtype=float32>, discount=<tf.Tensor 'time_step_2:0' shape=() dtype=float32>, observation=<tf.Tensor 'time_step_3:0' shape=(1, 79) dtype=float64>)
* ()
Keyword arguments: {}
Expected these arguments to match one of the following 2 option(s):
Option 1:
Positional arguments (2 total):
* TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step/step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step/reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step/discount'), observation=TensorSpec(shape=(None,
79), dtype=tf.float64, name='time_step/observation'))
* ()
Keyword arguments: {}
Option 2:
Positional arguments (2 total):
* TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation=TensorSpec(shape=(None, 79), dtype=tf.float64, name='observation'))
* ()
Keyword arguments: {}
What am I doing wrong? Is it really just the tensor names or are the shapes the problem and how can I change that?
Any ideas how to further debug this are appreciated.
Upvotes: 8
Views: 4823
Reputation: 4207
I got it to work by constructing the TimeStep manually:
step_type = tf.convert_to_tensor(
[0], dtype=tf.int32, name='step_type')
reward = tf.convert_to_tensor(
[0], dtype=tf.float32, name='reward')
discount = tf.convert_to_tensor(
[1], dtype=tf.float32, name='discount')
observations = tf.convert_to_tensor(
[state], dtype=tf.float64, name='observations')
timestep = ts.TimeStep(step_type, reward, discount, observations)
prediction = self.policy.action(timestep)
Upvotes: 7