Reputation: 5063
I'm trying to use a module in databrick's notebook but I am completely blocked. I'd like to execute the following command or anything similar which allow my to make instances of MyClass
from mypackage.mymodule import MyClass
Following databrick's documentation I have develop a python package with a single module locally as follows:
mypackage
|- __init__.py
|- setup.py
|- mymodule.py
Then run python setup.py bdist_wheel
obtaining a .whl
file. The directory ends up being
mypackage
|- build
|- ... whatever
|- src.egg-info
|- ... whatever
|- dist
|- src-0.1-py3-none-any.whl
|- __init__.py
|- setup.py
|- mymodule.py
From here I've uploaded the .whl
file into the Workspace following the instructions. But now I'm not able to import MyClass
into any notebook.
I've tried all approches below:
.whl
with and without a name..whl
installing it into the cluster and not.import mypackage
dbutils.library.install('dbfs:/path/to/mypackage.whl/')
(which returns True
) and then use import ...
.whl
, create the package folder in the same directory as the notebook.Shared
folder import differentname
This is driving my crazy. I its such a simple task which I can achive easily with regular notebooks.
Upvotes: 10
Views: 14663
Reputation: 596
For anyone else trying to solve this in a Databricks Workspace, without using Repos, the key seems to be ensuring your module code is a File and not a Notebook.
Here is a minimal example, which works for me on the 12.2 LTS runtime.
testmod.py: (File, not Notebook)
def hello():
print ('Hello')
Any Notebook in the same folder:
import testmod
testmod.hello()
If your module is in a subfolder/package called testpackage
, you can do:
from testpackage import testmod
testmod.hello()
If your module is in a higher-level folder, you may need to add the path to sys.path
. The following worked for me:
import os
import sys
sys.path.append(os.path.abspath("/Workspace/Shared/"))
If you are uploading your code via the API or CLI, you can make it a File rather than a Notebook by following this answer: https://stackoverflow.com/a/77580533/19734178.
Upvotes: 2
Reputation: 964
With the the introduction of support for arbitrary files in Databricks Repos, it is now possible to import
custom modules/packages easily, if the module/package resides in the linked git repo.
First,
Both of these can be enabled from Settings -> Admin Console -> Workspace Settings.
Then, with the following directory structure in the git repo,
.
├── mypackage
│ ├── __init__.py
│ └── mymodule.py
└── test_notebook
it is possible to import
the module mymodule
in the package mypackage
from test_notebook
simply by executing the following statement:
# This is test_notebook in the above filetree
from mypackage.mymodule import MyClass
Upvotes: 3
Reputation: 5063
I've solved this by using python's egg
instead of wheel
. python setup.py bdist_egg
will create an egg which you can install following databricks docs. I don't know why wheel
doesn't work...
Upvotes: 2