King Dedede
King Dedede

Reputation: 1010

Tensorflow - S3 object does not exist

How do I set up direct private bucket access for Tensorflow?

After running
from tensorflow.python.lib.io import file_io and running print file_io.stat('s3://my/private/bucket/file.json') I end up with an error -
NotFoundError: Object s3://my/private/bucket/file.json does not exist

However, the same line on a public object works without an error:
print file_io.stat('s3://ryft-public-sample-data/wikipedia-20150518.bin')

There appears to be an article on support here: https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md
However, I end up with the same error after exporting the variables shown.

I have awscli set up with all credentials, and boto3 can view and download the file in question. I am wondering how I can get Tensorflow to have S3 access directly when the bucket is private.

Upvotes: 4

Views: 1653

Answers (1)

marcin
marcin

Reputation: 2955

I had the same problem when trying to access files in private S3 bucket from Sagemaker notebook. The mistake I made was to try using credentials I obtained from boto3, which seem not to be valid outside.

The solution was not to specify credentials (in such case it uses the role attached to the machine), but instead just specify the region name (for some reason it didn't read it from ~/.aws/config file) as follows:

import boto3
import os

session = boto3.Session()
os.environ['AWS_REGION']=session.region_name

NOTE: when debugging this error useful was to look at CloudWatch logs, as the logs of S3 client were printed only there and not in the Jupyter notebook. In there I have first have seen, that:

  1. when I did specify credentials from boto3 the error was: The AWS Access Key Id you provided does not exist in our records.
  2. When accessing without AWS_REGION env variable set I had The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. which apparently is common when you don't specify bucket (see 301 Moved Permanently after S3 uploading)

Upvotes: 4

Related Questions