Reputation: 649

Data bricks cluster installs all the packages every time I start it

I have been working on Databricks notebook using Python/ R. Once job is done we need to terminate the cluster to save cost involved. ( As we are utilizing the machine).

So we also have to start the cluster if we want to work on any notebook. I have seen it takes a lot of time and install the packages again in the cluster. Is there any way to avoid installation everytime we start cluster?

Upvotes: 4

Answers (2)

Henrique Florencio

Reputation: 3751

Update: Databricks now allows custom docker containers.

Unfortunately not.

When you terminate a cluster its memory state is lost, so when you start it again it comes with a clean image. Even if you add the desired packages into an init script they will have to be installed each initialization.

You may ask Databricks support to check if it is possible to create a custom cluster image for you.

Upvotes: 2

Satty

Reputation: 171

I am using conda env to install the packages. After my 1st installation, I am saving the environment as a yaml file in dbfs and using the same yaml file in the all other runs. This way I don't have to install the packages again.

Save the environment as a conda YAML specification.

%conda env export -f /dbfs/filename.yml

Import the file to another notebook using conda env update.

%conda env update -f /dbfs/filename.yml

List the packages -

%conda list

Upvotes: 0

Data bricks cluster installs all the packages every time I start it

Answers (2)

Related Questions