Reputation: 1604

Run a notebook from another notebook in a Repo Databricks

I have a notebook with functions in a repo folder that I am trying to run in another notebook.

Normally I can run it as such: %run /Users/name/project/file_name

So I cloned the two files (function_notebook, processed_notebook) into a Repo in Databricks.

When I try to copy the path where I just cloned it, onlt this option appears: Copy File Path relative to Root

However in the Workspace user folder the option is Copy File Path

Evidently I dont quite grasp the difference between the relative path and the workspace path.

How can I run the notebook that has been cloned in the repo ?

Hierarchy:

RepoName (has 2 folders):

Folder1 Notebook1
Folder2 Notebook2

I'm in Notebook1 wanting to run Notebook2

%run ../Folder2/Notebook2

Upvotes: 7

Answers (3)

George Sotiropoulos

Reputation: 2123

Just for the record I give some code that you can execute in a notebook and "update" another repo folder and then execute it. I believe it does what the accepted answer says, by using the databricksapi within databricks notebook.

context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
url = context['extraContext']['api_url']
token = context['extraContext']['api_token']
from databricks_cli.repos.api import ReposApi
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.workspace.api import WorkspaceApi


api_client = ApiClient(
    host=url,
    token=token
)
repo_url = "https://[email protected]/your_repo_url" # same as the one you use to clone
repos_path = "/Repos/your_repo/"
repos_api = ReposApi(api_client)
workspace_api = WorkspaceApi(api_client)


workspace_api.mkdirs(repos_path) # 1. create the initial folder if doesnt exist
# 2. Then if the repo already exists, delete it and create it again. That is, to ensure that you get the update branch you want. 
try: 
  repo_id = repos_api.get_repo_id(repos_path+ "your_repo")
  repos_api.delete(repo_id)
except RuntimeError:
  pass

repos_api.create(url=repo_url,  path=  repos_path+ "your_repo",  provider = 'azureDevOpsServices' )
repos_api.update(repo_id = repos_api.get_repo_id( repos_path+ "your_repo"),
                 branch='master', tag = None)

What it does:

First connects using the context. Then deletes the target folder if exists creates and updates. (probably update is redundant) I am deleting the existing folder o avoid conflicts with local changes. If someone made changes in the target Repo folder and you just update, you pull the changes from the origin but doesnt remove you changes existing there. With delete and create , it’s like resetting the folder.

In that way you can execute a script from another repo.

Alternatively, another way to do that is to create a job in databricks and use the databricksAPI to run it. However, you will have to create different job for each different notebook to be executed.

Upvotes: 0

gunn

Reputation: 321

My notebook is called "UserLibraries" and i successfully ran it in separate cell without any other commands. Maybe it is the case. And if the path is correct I can open called NB in a new browser window by clicking path (it becomes hyperlink) (see picture).

Upvotes: 3

Alex Ott

Reputation: 87079

It's an UI problem that was already reported to development team. Until that time you need to create the path yourself. The difference is that it's starts with /Repos not with /Users. I have a small demo that shows how to use Repos to perform testing, etc. - if you interested in details.

But if the files are inside the same repository, then you don't need to use full paths, it's making them less portable - you can use relative paths, like, ./file_name to include notebook in the current folder, or ../file_name to include file in the level up folder, or ./folder/file_name to include file from the subfolder - but don’t specify file extension. In this case your code is portable, and could be used in different checkouts.

Example:

Notebook2:

Notebook1:

The name difference between workspace path & relative path is that former gives you full path inside the Workspace, while later gives you path relative to the root of the Repo

Upvotes: 4

Run a notebook from another notebook in a Repo Databricks

Answers (3)

Related Questions