Yosi Pramajaya
Yosi Pramajaya

Reputation: 4095

SageMaker Estimator.fit() didn't pass the 'train' input to the Training instance

As in the documentation / tutorial mentioned, we can call Estimator.fit() to start Training Job.

Required parameter for the method would be the inputs that is s3 / file reference to the Training File. Example:

estimator.fit({'train':'s3://my-bucket/training_data})

training-script.py

parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])

I would expect os.environ['SM_CHANNEL_TRAIN'] to be the S3 path. But instead, it returns /opt/ml/input/data/train.

Anyone know why?

Update

I also tried to call estimator.fit('s3://my-bucket/training_data'). And somehow training instance didn't get the SM_CHANNEL_TRAIN Environment Variables. In fact, I didn't see the s3 URI in Environment Variables at all.

Upvotes: 3

Views: 2733

Answers (2)

ByungWook
ByungWook

Reputation: 384

When running training jobs in SageMaker the S3 URL containing your training data provided ends up being copied into the docker container (aka training job) from the specified url. Thus the environment variable SM_CHANNEL_TRAIN is pointing to the local path of the training data that was copied from the S3 URL provided.

https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#SageMaker-CreateTrainingJob-request-InputDataConfig

Upvotes: 3

Farrago-Alex
Farrago-Alex

Reputation: 120

This is most likely because your argument os.environ['SM_CHANNEL_TRAIN'] doesn't give a path with the s3:// prefix on it, if you are expecting it to pull the data from s3. Without that prefix, it instead searches its own local file system in the image for that path instead.

Upvotes: 0

Related Questions