Reputation: 5
I am referring to the links below to use Tensorboard in Sagemaker Script Mode method.
https://www.tensorflow.org/tensorboard/get_started
Below is my tensorboard callback in my training script which is a .py file
model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback])
In a notebook, I am creating the below Tensorflow Estimator where I am passing the above Script file name as entry_point.
estimator = TensorFlow(
entry_point='Script_File.py',
train_instance_type=train_instance_type,
train_instance_count=1,
model_dir=model_dir,
hyperparameters=hyperparameters,
role=sagemaker.get_execution_role(),
base_job_name='tf-fashion-mnist',
framework_version='1.12.0',
py_version='py3',
output_path=<S3 Path>,
script_mode=True,
)
I am using the below code in my notebook to start the training.
estimator.fit(inputs)
Once training is done, I am using the below code in a Terminal(have tried in my Notebook cell as well) to launch tensorboard.
tensorboard --logdir logs/fit
But in the tensorboard I am not able to view any graphs. It is showing the message "Failed to fetch runs". Is there something that I am missing? Or do I have to do any extra setting in my script to see my logs in Tensorboard?
Upvotes: 0
Views: 1601
Reputation: 2765
Your tensorboard logdir
is not logs/fit
.. but there is the current date appended. Try using a logs/fit
as log_dir
and see if it's working.
EDIT
If you want to use tensorboard locally you have to send tensorboard logs to S3 and read from there. In order to do this you have to do what your third linked example does, so include sagemaker debugger:
from sagemaker.debugger import TensorBoardOutputConfig
tensorboard_output_config = TensorBoardOutputConfig( s3_output_path='s3://path/for/tensorboard/data/emission', container_local_output_path='/local/path/for/tensorboard/data/emission' )
then your tensorboard command will be something like:
AWS_REGION= <your-region> AWS_LOG_LEVEL=3 tensorboard --logdir s3://path/for/tensorboard/data/emission
Alternatively if you want to use tensorboard in the notebook you have to do what the second linked example does, so simply install in a cell and run tensorboard with something like:
https://<notebook instance hostname>/proxy/6006/
Upvotes: 1