Reputation: 28372
All descriptions of links referenced in the question below are from 2021/05/31.
I have trained a deep Q network following the version of the TF Agents tutorial on a custom problem. Now I would like to feed it some hand-crafted observations to see what actions it recommends. I have some utility functions for creating these feature vectors that I use in my PyEnvironment. However, I am not sure how to convert these bits to feed into the network.
What I would like to have is something like the following:
My environment has a stochastic component, so I want to manually modify the environment state rather than have the agent explicitly take a path through the environment.
To make progress on this question, I have been examining this tutorial on policies. It looks like, my use case might be similar to the section "Random TF Policy" or the one below on "Actor policies". However, in my use case I have a loaded agent and have Python (non TF) observation, time specs, and action specs. What is the ideal approach to drive my network to produce actions from these components?
Here is something I have tried:
saved_policy = tf.compat.v2.saved_model.load(policy_dir)
# get_feat_vector returns an numpy.ndarray
observation = tf.convert_to_tensor(state.get_feat_vector(), dtype=tf.float32)
time_step = ts.restart(observation)
action_step = saved_policy.action(time_step)
and the associated error message:
File "/home/---/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 267, in restored_function_body
raise ValueError(
ValueError: Could not find matching function to call loaded from the SavedModel. Got:
Positional arguments (2 total):
* TimeStep(step_type=<tf.Tensor 'time_step:0' shape=() dtype=int32>, reward=<tf.Tensor 'time_step_1:0' shape=() dtype=float32>, discount=<tf.Tensor 'time_step_2:0' shape=() dtype=float32>, observation=<tf.Tensor 'time_step_3:0' shape=(170,) dtype=float32>)
* ()
Keyword arguments: {}
Expected these arguments to match one of the following 2 option(s):
Option 1:
Positional arguments (2 total):
* TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation=TensorSpec(shape=(None, 170), dtype=tf.float32, name='observation'))
* ()
Keyword arguments: {}
Option 2:
Positional arguments (2 total):
* TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step/step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step/reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step/discount'), observation=TensorSpec(shape=(None, 170), dtype=tf.float32, name='time_step/observation'))
* ()
Keyword arguments: {}
Upvotes: 2
Views: 333
Reputation: 815
I believe your problem might be with how you are loading and saving the model. TF-Agents recommends using the PolicySaver (see here). So maybe try running code like
tf_agent = ...
tf_policy_saver = policy_saver.PolicySaver(policy=tf_agent.policy)
... # train agent
tf_policy_saver.save(export_dir=policy_dir_path)
and then load and run the model with:
eager_py_policy = py_tf_eager_policy.SavedModelPyTFEagerPolicy(
policy_dir, env.time_step_spec(), env.action_spec())
policy_state = eager_py_policy.get_initial_state(1)
time_step = env.reset()
action_step = eager_py_policy.action(time_step, policy_state)
time_step = env.step(action_step.action)
policy_state = action_step.state
Or whatever manual thing you want to do with the environment and observations.
Upvotes: 2