Reputation: 1825
I am playing with Azure Databricks. I uploaded a Python file and added it to the spark context with
spark.sparkContext.addPyFile('/location/to/the/file/model.py')
Everything works fine when I run my Python code in the notebook. But when I make a change to the model.py
file and uploaded it with --overwrite
, the code in the notebook does not pick up the new version of the file, instead still uses the old one until I restart the cluster.
Is there a way to avoid cluster restart whenever I overwrite a file?
Upvotes: 1
Views: 1009
Reputation: 87249
Unfortunately it doesn't work this way - when file is added, it's distributed to all workers and used there.
But you may do it differently - you can use feature of the Repos called arbitrary files (don't forget restart cluster after enabling it), so when you clone your repository into Repos, then you can use Python files (not notebooks) from that repository as Python packages. And then in the notebook, you can use following directives to force reload of changes into notebook environment:
%load_ext autoreload
%autoreload 2
You can see an example of such usage in this demo - Python code from my_package
directory is used in the notebook.
Upvotes: 3