Dimitris Poulopoulos
Dimitris Poulopoulos

Reputation: 1159

How to read a file during training in AWS SageMaker?

I'm trying to train a custom tensorflow model, using AWS SageMaker. Thus, in the model_fn method, that I should provide, I want to be able to read an external file. I've uploaded the file to S3 and try to read like below:

BUCKET_PATH = 's3://<bucket_name>/data/<prefix>/'

def model_fn(features, labels, mode, params):
    # Load vocabulary
    vocab_path = os.path.join(BUCKET_PATH, 'vocab.pkl')
    with open(vocab_path, 'rb') as f:
        vocab = pickle.load(f)
    n_vocab = len(vocab)
    ...

I get an IOError: [Errno 2] No such file or directory

How can I read this file during training?

Upvotes: 2

Views: 1006

Answers (1)

Raman
Raman

Reputation: 673

I don't think pickle.load can ping an S3 bucket. You can either keep the data in the python notebook path or download it using boto3 client.

Moreover, you'd probably not want to download it in model_fn. That would be called for each epoch. Generally data is loaded and prepared in the train_input_fn.

Upvotes: 1

Related Questions