Optimus
Optimus

Reputation: 1825

Uploading a new version of a Python file to Azure Databricks requires cluster restart

I am playing with Azure Databricks. I uploaded a Python file and added it to the spark context with

spark.sparkContext.addPyFile('/location/to/the/file/model.py')

Everything works fine when I run my Python code in the notebook. But when I make a change to the model.py file and uploaded it with --overwrite, the code in the notebook does not pick up the new version of the file, instead still uses the old one until I restart the cluster.

Is there a way to avoid cluster restart whenever I overwrite a file?

Upvotes: 1

Views: 1009

Answers (1)

Alex Ott
Alex Ott

Reputation: 87249

Unfortunately it doesn't work this way - when file is added, it's distributed to all workers and used there.

But you may do it differently - you can use feature of the Repos called arbitrary files (don't forget restart cluster after enabling it), so when you clone your repository into Repos, then you can use Python files (not notebooks) from that repository as Python packages. And then in the notebook, you can use following directives to force reload of changes into notebook environment:

%load_ext autoreload
%autoreload 2

You can see an example of such usage in this demo - Python code from my_package directory is used in the notebook.

Upvotes: 3

Related Questions