Vincent Tan
Vincent Tan

Reputation: 23

How to develop Python Library in DataBricks without packaging and installing after every single change?

For simplicity, say I have 2 Python scripts. 1 is main, 1 is lib. My question is how can I test my lib in main without needing to build the lib and installing it every single time?

Single file can be done easily as answered here ( https://stackoverflow.com/a/67280018/18105234 ). What about I have nested library?

The idea is to perform development in DataBricks like in a Jupyter Lab.

Upvotes: 2

Views: 1461

Answers (1)

Alex Ott
Alex Ott

Reputation: 87174

There are two approaches:

  1. Use %run (doc) to include the "library" notebook into "main" notebook. You need to re-execute that %run cell. Full example of this approach could be found in this file.

  2. Use new functionality of Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding __init__.py (right now you can't use notebooks), and then you include it as a "normal" Python package using import command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example:

%load_ext autoreload
%autoreload 2

The 2nd approach has more advantages, as it allows to take the code, and, for example, build a library from it, or apply more code checks, that aren't possible with notebooks out of box.

P.S. My repository shows full example of how to use Databricks Repos and perform testing of the code in notebooks from CI/CD pipeline

Upvotes: 1

Related Questions