Pablo Brenner
Pablo Brenner

Reputation: 193

How to install python packages in a Google Dataproc cluster

Is it possible to install python packages in a Google Dataproc cluster after the cluster is created and running?

I tried to use "pip install xxxxxxx" in the master command line but it does not seem to work.

Google's Dataproc documentation does not mention this situation.

Upvotes: 12

Views: 15398

Answers (1)

tix
tix

Reputation: 2158

This is generally not possible after cluster is created. I recommend using an initialization action to do this.

As you've noticed, pip is also not available by default. So you'll want to run easy_install pip followed by pip install command.

Finally, if your intention is to use this cluster in any automation, and/or you want hermeticness, I recommend creating a wheel that you store in GCS and download in init action. You'd then install your wheel. Wheels have added benefit of being faster than installing many packages from pip directly.

2019 Update

See this tutorial on how to configure Python environment on Dataproc: https://cloud.google.com/dataproc/docs/tutorials/python-configuration

Upvotes: 10

Related Questions