Azure Databricks: How to add Spark configuration in Databricks workspace level?

Question

I want to add some spark configuration at work Databricks workspace so it gets replicated to all the clusters in the workspace.

A sample global init script for the same will be helpful.

Naveen Sharma · Accepted Answer

You can set Spark configurations at different levels. Step 1: Try with the Cluster level Configuration.

Create sample global init script that sets the spark.sql.shuffle.partitions configuration to 100.
Open a notepad and create a new file named set-spark-config.sh Use the blow code in the Note pad and save it as set-spark-config.sh

Code:

**#!/usr/bin/env bash**

**echo "Setting Spark configuration..."**

**echo "spark.sql.shuffle.partitions 100" >> /databricks/spark/conf/spark-defaults.conf**

Upload the set-spark-config.sh to your DBFS

In the Databricks navigate to the Admin Console / Global Init Scripts / ADD Script

Name the Script like for example Set Configuration provide the path like this /FileStore/tables/set-spark-config.sh Please refer to the screenshot.

enter image description here

Once you have created the init script, it will be executed on all clusters in the workspace. The spark.sql.shuffle.partitions configuration will be set to 100 for all Spark jobs running on these clusters.

Note: that global init scripts are executed at startup time, so any changes to the configuration will not take effect until the clusters are restarted.

Step 2: In the Databricks navigate to the Admin Console / Global Init Scripts / ADD Script Name the Script like for example Set Configuration01 In the Script area try for this

spark.sql.execution.arrow.pyspark.enabled true enter image description here

Save and Enable the Script.

Note: This applies the configuration to all clusters and notebooks in the workspace.

Azure Databricks: How to add Spark configuration in Databricks workspace level?

Answers (1)

Related Questions