Reputation: 139
Following is the command i have used to create the dataproc cluster. There are two initialization scripts here. (1) jupyter.sh
(2) my_initialize.sh
gcloud dataproc clusters create dproc \
--subnet default --zone us-west1-a --project myproject \
--initialization-actions gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://mydataproc/my_initialize.sh \
--master-machine-type n1-standard-8 --master-boot-disk-size 40 \
--worker-machine-type n1-standard-8 --worker-boot-disk-size 40 --num-workers 4
Following is in my_initialize.sh
#!/usr/bin/env bash
pip install --upgrade google-cloud-bigquery
When we install jupyter.sh, i believe pip is already installed.
For some reason cluster creation is failed with the error as line 2: pip command not found.
Upvotes: 1
Views: 558
Reputation: 2158
I believe this is an issue where the init action is not seeing changes to the environment from previous init actions. We will be rolling out a fix for this in next few weeks so sourcing profile.d
should not be necessary after that. This will be announced in release notes.
In the mean time (as @Karthik Palaniappan mentions, just use pip by its full path /opt/conda/bin/pip
.
Finally, on Dataproc 1.3
image you can use Anaconda+Jupyter Optional Components. Using components over init actions will cut down on overall cluster boot time.
Upvotes: 1
Reputation: 1383
Yeah, this is because neither pip
nor anything else in /opt/conda/bin/
are in $PATH
for your second init action. In fact, they don't end up on the path for the root
user, even if you run sudo su root
: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/issues/246.
If you're interested in fixing that issue, I'd be happy to accept a PR. Just as a starting point:bootstrap-conda.sh
sets up /etc/profile.d/conda.sh
here.
And other scripts source that file explicitly.
Unless there's a simple way to change $PATH
systemwide, I think your best bet is to explicitly source /etc/profile.d/conda.sh
as well.
Alternatively, run pip
with its absolute path, e.g. /opt/conda/bin/pip install ...
.
Upvotes: 0