Sujay DSa
Sujay DSa

Reputation: 1192

How do I load python modules which are not available in Sagemaker?

I want to install spacy which is not available as part of the Sagemaker platform. How should can I pip install it?

Upvotes: 2

Views: 3443

Answers (2)

vincent_zhang
vincent_zhang

Reputation: 93

Great answer from Raman. I wanted to add another way of specifying the required python modules in the training instance, in case someone is looking.

tf_estimator = TensorFlow(entry_point='tf-train.py', role='SageMakerRole',
                          training_steps=10000, evaluation_steps=100,
                          train_instance_count=1,
                          source_dir='./',
                          requirements_file='requirements.txt',
                          train_instance_type='ml.p2.xlarge')

source_dir and requirements_file both have to be defined for it to work. The path is wrt to the notebook instance. If requirements.txt is under the same directory as the notebook, then just use './'

Docs is here.

Upvotes: 5

Raman
Raman

Reputation: 673

When creating you model, you can specify the requirements.txt as an environment variable.

For Eg.

env = {
    'SAGEMAKER_REQUIREMENTS': 'requirements.txt', # path relative to `source_dir` below.
}
sagemaker_model = TensorFlowModel(model_data = 's3://mybucket/modelTarFile,
                                  role = role,
                                  entry_point = 'entry.py',
                                  code_location = 's3://mybucket/runtime-code/',
                                  source_dir = 'src',
                                  env = env,
                                  name = 'model_name',
                                  sagemaker_session = sagemaker_session,
                                 )

This would ensure that the requirements file is run after the docker container is created, before running any code on it.

Upvotes: 10

Related Questions