lsmor
lsmor

Reputation: 5063

How to import a local module into azure databricks notebook?

I'm trying to use a module in databrick's notebook but I am completely blocked. I'd like to execute the following command or anything similar which allow my to make instances of MyClass

from mypackage.mymodule import MyClass

Following databrick's documentation I have develop a python package with a single module locally as follows:

mypackage
|- __init__.py
|- setup.py
|- mymodule.py

Then run python setup.py bdist_wheel obtaining a .whl file. The directory ends up being

mypackage
|- build
   |- ... whatever
|- src.egg-info
   |- ... whatever
|- dist
   |- src-0.1-py3-none-any.whl
|- __init__.py
|- setup.py
|- mymodule.py

From here I've uploaded the .whl file into the Workspace following the instructions. But now I'm not able to import MyClass into any notebook.

I've tried all approches below:

This is driving my crazy. I its such a simple task which I can achive easily with regular notebooks.

Upvotes: 10

Views: 14663

Answers (3)

Mike
Mike

Reputation: 596

For anyone else trying to solve this in a Databricks Workspace, without using Repos, the key seems to be ensuring your module code is a File and not a Notebook.

Here is a minimal example, which works for me on the 12.2 LTS runtime.

testmod.py: (File, not Notebook)

def hello():
    print ('Hello')

Any Notebook in the same folder:

import testmod
testmod.hello()

If your module is in a subfolder/package called testpackage, you can do:

from testpackage import testmod
testmod.hello()

If your module is in a higher-level folder, you may need to add the path to sys.path. The following worked for me:

import os
import sys
sys.path.append(os.path.abspath("/Workspace/Shared/"))

If you are uploading your code via the API or CLI, you can make it a File rather than a Notebook by following this answer: https://stackoverflow.com/a/77580533/19734178.

Upvotes: 2

fskj
fskj

Reputation: 964

With the the introduction of support for arbitrary files in Databricks Repos, it is now possible to import custom modules/packages easily, if the module/package resides in the linked git repo.

First,

  1. Make sure Repos for Git integration is enabled.
  2. Make sure support for arbitrary files is enabled.

Both of these can be enabled from Settings -> Admin Console -> Workspace Settings.

Then, with the following directory structure in the git repo,

.
├── mypackage
│   ├── __init__.py
│   └── mymodule.py
└── test_notebook

it is possible to import the module mymodule in the package mypackage from test_notebook simply by executing the following statement:

# This is test_notebook in the above filetree
from mypackage.mymodule import MyClass

Upvotes: 3

lsmor
lsmor

Reputation: 5063

I've solved this by using python's egg instead of wheel. python setup.py bdist_egg will create an egg which you can install following databricks docs. I don't know why wheel doesn't work...

Upvotes: 2

Related Questions