Reputation: 1029
Has somebody figured out how to install packages on AWS Sagemaker Notebook instances so they are available in the PySpark kernel? I made several attempts now including the lifecycle scripts but it seems I just miss the right python env. Package in question is joblib
but I guess it shouldn't matter?!
Upvotes: 4
Views: 3391
Reputation: 1
You can install any package as follows:
sc.install_pypi_package("pandas==0.25.1")
See this blog for details.
Upvotes: 0
Reputation: 422
Thanks for using Amazon SageMaker!
PySpark kernel unlike any other kernel is only running when there is EMR cluster to connect to. Whereas the Lifecycle Config runs before the Notebook Instance is put InService. So you cannot use Lifecycle Config to install packages in PySpark kernel, packages can only be installed after the kernel is started and connected to EMR cluster.
In order to install packages to PySpark kernel, you can do pip install <package_name>
once the kernel is started up and it'll execute the command on EMR cluster master.
Thanks,
Neelam
Upvotes: 1