gapvision
gapvision

Reputation: 1029

Install python package to PySpark Kernel in Sagemaker Notebooks

Has somebody figured out how to install packages on AWS Sagemaker Notebook instances so they are available in the PySpark kernel? I made several attempts now including the lifecycle scripts but it seems I just miss the right python env. Package in question is joblib but I guess it shouldn't matter?!

Upvotes: 4

Views: 3391

Answers (2)

Sowmya Kuruba
Sowmya Kuruba

Reputation: 1

You can install any package as follows:

sc.install_pypi_package("pandas==0.25.1")

See this blog for details.

Upvotes: 0

Neelam Gehlot
Neelam Gehlot

Reputation: 422

Thanks for using Amazon SageMaker!

PySpark kernel unlike any other kernel is only running when there is EMR cluster to connect to. Whereas the Lifecycle Config runs before the Notebook Instance is put InService. So you cannot use Lifecycle Config to install packages in PySpark kernel, packages can only be installed after the kernel is started and connected to EMR cluster.

In order to install packages to PySpark kernel, you can do pip install <package_name> once the kernel is started up and it'll execute the command on EMR cluster master.

Thanks,

Neelam

Upvotes: 1

Related Questions