Mikhila Mehta
Mikhila Mehta

Reputation: 5

How to use Tensorboard in AWS Sagemaker

I am referring to the links below to use Tensorboard in Sagemaker Script Mode method.

https://www.tensorflow.org/tensorboard/get_started

https://levelup.gitconnected.com/how-to-use-tensorboard-in-an-amazon-sagemaker-notebook-instance-a41ce2fd973f

https://towardsdatascience.com/using-tensorboard-in-an-amazon-sagemaker-pytorch-training-job-a-step-by-step-tutorial-19b2b9eb4d1c

Below is my tensorboard callback in my training script which is a .py file

model = create_model()
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
model.fit(x=x_train, 
          y=y_train, 
          epochs=5, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

In a notebook, I am creating the below Tensorflow Estimator where I am passing the above Script file name as entry_point.

estimator = TensorFlow(
    entry_point='Script_File.py',
    train_instance_type=train_instance_type,
    train_instance_count=1,
    model_dir=model_dir,
    hyperparameters=hyperparameters,
    role=sagemaker.get_execution_role(),
    base_job_name='tf-fashion-mnist',
    framework_version='1.12.0', 
    py_version='py3',
    output_path=<S3 Path>,
    script_mode=True,
)

I am using the below code in my notebook to start the training.

estimator.fit(inputs)

Once training is done, I am using the below code in a Terminal(have tried in my Notebook cell as well) to launch tensorboard.

tensorboard --logdir logs/fit

But in the tensorboard I am not able to view any graphs. It is showing the message "Failed to fetch runs". Is there something that I am missing? Or do I have to do any extra setting in my script to see my logs in Tensorboard?

Upvotes: 0

Views: 1601

Answers (1)

rok
rok

Reputation: 2765

Your tensorboard logdir is not logs/fit.. but there is the current date appended. Try using a logs/fit as log_dir and see if it's working.

EDIT

If you want to use tensorboard locally you have to send tensorboard logs to S3 and read from there. In order to do this you have to do what your third linked example does, so include sagemaker debugger:

from sagemaker.debugger import TensorBoardOutputConfig

tensorboard_output_config = TensorBoardOutputConfig( s3_output_path='s3://path/for/tensorboard/data/emission', container_local_output_path='/local/path/for/tensorboard/data/emission' )

then your tensorboard command will be something like:

AWS_REGION= <your-region> AWS_LOG_LEVEL=3 tensorboard --logdir s3://path/for/tensorboard/data/emission

Alternatively if you want to use tensorboard in the notebook you have to do what the second linked example does, so simply install in a cell and run tensorboard with something like:

https://<notebook instance hostname>/proxy/6006/

Upvotes: 1

Related Questions