Reputation: 1914
I'm trying to access the Tensorboard for the tensorflow_resnet_cifar10_with_tensorboard
example, but not sure what the url should be, the help text gives 2 options:
You can access TensorBoard locally at http://localhost:6006 or using your SageMaker notebook instance proxy/6006/(TensorBoard will not work if forget to put the slash, '/', in end of the url). If TensorBoard started on a different port, adjust these URLs to match.
When it says access locally, does that mean the local container Sagemaker creates in AWS? If so, how do I get there?
Or if I use run_tensorboard_locally=False
, what should the proxy url be?
Upvotes: 12
Views: 8642
Reputation: 497
You can find a more detailed tutorial here: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tensorboard.html
You can save your logs like this:
LOG_DIR = os.path.join(os.getcwd(), "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
EFS_PATH_LOG_DIR = "/".join(LOG_DIR.strip("/").split('/')[1:-1])
Then lunch Tensorboard by following these steps:
Open a new Terminal
.
Install Tensorboard and launch it (Copy EFS_PATH_LOG_DIR from the Jupyter notebook):
pip install tensorboard
tensorboard --logdir <EFS_PATH_LOG_DIR>
Open Tensorboard:
https://<YOUR_Notebook_URL>.studio.region.sagemaker.aws/jupyter/default/proxy/6006/
If you store your logs in an S3 you can luanch it again from Terminal by doing:
AWS_REGION=region tensorboard --logdir s3://bucket_name/logs/
and then again going to the same url: https://<YOUR_Notebook_URL>.studio.region.sagemaker.aws/jupyter/default/proxy/6006/
Upvotes: 0
Reputation: 321
Here is my solution:
If URL of my sagemaker notebook instance is:
https://myinstance.notebook.us-east-1.sagemaker.aws/notebooks/image_classify.ipynb
And URL of accessing TensorBoard will be:
https://myinstance.notebook.us-east-1.sagemaker.aws/proxy/6006/
Upvotes: 21
Reputation: 353
You can access TensorBoard on your notebook using the link "proxy/6006".
If you set run_tensorboard_locally=False then it won't start TensorBoard.
If the URL you clicked gives you the error "[Errno 111] Connection refused" then it seems that training has already stopped. According to https://github.com/aws/sagemaker-python-sdk it "terminates TensorBoard when the execution ends" so it seems you have to access it during the training step only.
Upvotes: 3
Reputation: 96
"Local" there refers to the machine which is running the estimator.fit method. So if you are running the example notebook on a SageMaker notebook instance, tensorboard will be running on that machine.
The "proxy/6006" part of the text you quoted is a clickable link which will bring up TensorBoard on your notebook. The full URL will be "https://.notebook..sagemaker.aws/proxy/6006/".
Upvotes: 1