netskink
netskink

Reputation: 4539

how to use tensorboard debugger with datalab which uses tf.estimator on google cloud platform

When I start tensorboard via datalab it uses the google syntax which is described here. This document only mentions start, stop and list. However, there is a debugger pane which I can not use.

This document describes how to use tensorboard debugger with a tf.estimator but it uses a different syntax.

Is there someway to blend the two so the debugger is usable with datalab?

Upvotes: 0

Views: 285

Answers (1)

netskink
netskink

Reputation: 4539

I don't think you can run tfdbg in datalab. You can take the code and run it at the console like so using this guide:

  1. I am using the datalab notebook which uses a model.py and task.py. My code originally was modeled after this file.

  2. Make this change to the model.py code as shown in the guide mentioned above.

    from tensorflow.python import debug as tf_debug
    # for debugging
    hooks = [tf_debug.LocalCLIDebugHook()]
    

Then in the train_and_evaluate(args) routine add a reference to the hooks in the parameter list for the EvalSpec() call. Like so:

    # .. also need an EvalSpec which controls the evaluation and
    # the checkpointing of the model since they happen at the same time
    eval_spec = tf.estimator.EvalSpec(
        input_fn = read_dataset(
            args['eval_data_paths'],
            batch_size = 10000,  # original 10000
            mode = tf.estimator.ModeKeys.EVAL),
        steps=None, # evals on 100 batches
        start_delay_secs = args['eval_delay_secs'], # start evaluating after N secoonds. 
        throttle_secs = args['min_eval_frequency'], # eval no more than every N seconds.
        exporters = exporter,# how to export the model for production.
        hooks = hooks) # for the debugger 

Then using your pereferred virtual python environment, do the following: (I am using anaconda)

  1. Open a python 2.7 environment with anaconda

    $ . ~/bin/setenv-anaconda2.sh
    
  2. Activate the tensorflow python2.7 anaconda environment

    $ conda activate tensorflow
    
  3. get the gcloud environment

    $ . ~/progs/datalab-notebooks/bin/setenv_google.sh
    
  4. For this model, set a python path to find modules

    cd ~/progs/datalab-notebooks/tf-debug
    export PYTHONPATH=${PYTHONPATH}:${PWD}/taxisimple
    

Then this to train: --train_steps=1000. appears to be max steps.

python -m trainer.task \
   --train_data_paths="${PWD}/taxi-train*" \
   --eval_data_paths=${PWD}/taxi-valid.csv  \
   --output_dir=${PWD}/taxi_trained \
   --train_steps=1000 --job-dir=./tmp

This will give you a tftdbg prompt. From here you can explore the model using tfdbg.

Upvotes: 1

Related Questions