Indhu Bharathi
Indhu Bharathi

Reputation: 1478

Can't read from tfrecords in S3 from notebook instance

I try to read from tfrecords in S3 from a Sage Maker notebook instance following instructions here: https://www.tensorflow.org/versions/master/deploy/s3

import tensorflow as tf
import os
os.environ['AWS_ACCESS_KEY_ID'] = '<my-key>'
os.environ['AWS_SECRET_ACCESS_KEY'] = '<my-secret>'

from tensorflow.python.lib.io import file_io
print(file_io.stat('s3://<my-bucket>/data/DEMO-mnist/train.tfrecords'))

The above code fails with the error:

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
<ipython-input-7-770c0aef6d7b> in <module>()
      1 from tensorflow.python.lib.io import file_io
----> 2 print(file_io.stat('s3://<my-bucket>/data/DEMO-mnist/train.tfrecords'))

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py in stat(filename)
    551   with errors.raise_exception_on_not_ok_status() as status:
    552     pywrap_tensorflow.Stat(compat.as_bytes(filename), file_statistics, status)
--> 553     return file_statistics

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    517             None, None,
    518             compat.as_text(c_api.TF_Message(self.status.status)),
--> 519             c_api.TF_GetCode(self.status.status))
    520     # Delete the underlying status object from memory otherwise it stays alive
    521     # as there is a reference to status from this from the traceback due to

NotFoundError: Object s3://<my-bucket>/data/DEMO-mnist/train.tfrecords does not exist

However the same code works fine if I run from a regular EC2 instance without using SageMaker.

IAM role used for the notebook instance has full S3 access.

Upvotes: 0

Views: 980

Answers (1)

SphericalCow
SphericalCow

Reputation: 176

I reproduced the problem in us-west-2.

But after I manually export environment variable AWS_REGION='us-west-2', it worked.

Also I tried not exporting AWS_REGION and tested on a us-east-1 bucket. It worked too.

So for some reason, the region info in aws profile is not retrieved and used. If environment variable AWS_REGION is not used, it will be always us-east-1, the default.

Upvotes: 2

Related Questions