Reputation: 702
We are writing notebooks in databricks. When we put them to git we want to run flake8 on them to check for new problems in the code.
As databricks has some predefined variables those are undefined in the code itself. Is it possible to filter our errors like:
F821 undefined name 'dbutils'
While keeping errors like
F821 undefined name 'my_var'
I am aware of the --ignore parameter
, but as far as I understand this would only allow to exclude F821 in general and not for a specific variable name.
Thanks
Upvotes: 2
Views: 482
Reputation: 81
Add the following at the beginning of the notebook
from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils
spark = SparkSession.getActiveSession()
dbutils = DBUtils(spark)
Optionally, install databricks-connect
instead of pyspark
in your local environment so that pyspark.dbutils
is known. (Flake8 does not check, but other tools like VS Code's Pylance do.)
This also gives code completion in external editors (VS Code in the following screenshot):
Upvotes: 0
Reputation: 69844
You can specify an additional list of builtins by using the builtins
parameter / configuration:
$ cat t2.py
db_utils.wat()
my_var.wat()
$ flake8 t2.py
t2.py:1:1: F821 undefined name 'db_utils'
t2.py:2:1: F821 undefined name 'my_var'
$ flake8 t2.py --builtins db_utils
t2.py:2:1: F821 undefined name 'my_var'
Upvotes: 2