Setjmp
Setjmp

Reputation: 28372

TF Agents: How to feed faked observations in to a trained deep Q network model to examine which actions it chooses?

All descriptions of links referenced in the question below are from 2021/05/31.

I have trained a deep Q network following the version of the TF Agents tutorial on a custom problem. Now I would like to feed it some hand-crafted observations to see what actions it recommends. I have some utility functions for creating these feature vectors that I use in my PyEnvironment. However, I am not sure how to convert these bits to feed into the network.

What I would like to have is something like the following:

  1. Feed in an initial state, and see the recommended action from the network.
  2. Manally alter the state, and see what the network recomends, next.
  3. And so on...

My environment has a stochastic component, so I want to manually modify the environment state rather than have the agent explicitly take a path through the environment.

To make progress on this question, I have been examining this tutorial on policies. It looks like, my use case might be similar to the section "Random TF Policy" or the one below on "Actor policies". However, in my use case I have a loaded agent and have Python (non TF) observation, time specs, and action specs. What is the ideal approach to drive my network to produce actions from these components?

Here is something I have tried:

saved_policy = tf.compat.v2.saved_model.load(policy_dir)
# get_feat_vector returns an numpy.ndarray
observation = tf.convert_to_tensor(state.get_feat_vector(), dtype=tf.float32)
time_step = ts.restart(observation)
action_step = saved_policy.action(time_step)

and the associated error message:

File "/home/---/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 267, in restored_function_body
    raise ValueError(
ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(step_type=<tf.Tensor 'time_step:0' shape=() dtype=int32>, reward=<tf.Tensor 'time_step_1:0' shape=() dtype=float32>, discount=<tf.Tensor 'time_step_2:0' shape=() dtype=float32>, observation=<tf.Tensor 'time_step_3:0' shape=(170,) dtype=float32>)
    * ()
  Keyword arguments: {}

Expected these arguments to match one of the following 2 option(s):

Option 1:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation=TensorSpec(shape=(None, 170), dtype=tf.float32, name='observation'))
    * ()
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step/step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step/reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step/discount'), observation=TensorSpec(shape=(None, 170), dtype=tf.float32, name='time_step/observation'))
    * ()
  Keyword arguments: {}

Upvotes: 2

Views: 333

Answers (1)

Federico Malerba
Federico Malerba

Reputation: 815

I believe your problem might be with how you are loading and saving the model. TF-Agents recommends using the PolicySaver (see here). So maybe try running code like

tf_agent = ...
tf_policy_saver = policy_saver.PolicySaver(policy=tf_agent.policy)

... # train agent

tf_policy_saver.save(export_dir=policy_dir_path)

and then load and run the model with:

eager_py_policy = py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    policy_dir, env.time_step_spec(), env.action_spec())

policy_state = eager_py_policy.get_initial_state(1)
time_step = env.reset()
action_step = eager_py_policy.action(time_step, policy_state)
time_step = env.step(action_step.action)
policy_state = action_step.state

Or whatever manual thing you want to do with the environment and observations.

Upvotes: 2

Related Questions