Reputation: 3
I want to add some spark configuration at work Databricks workspace so it gets replicated to all the clusters in the workspace.
A sample global init script for the same will be helpful.
Upvotes: 0
Views: 947
Reputation: 1298
You can set Spark configurations at different levels. Step 1: Try with the Cluster level Configuration.
Create sample global init script that sets the spark.sql.shuffle.partitions configuration to 100.
Open a notepad and create a new file named set-spark-config.sh
Use the blow code in the Note pad and save it as set-spark-config.sh
Code:
**#!/usr/bin/env bash**
**echo "Setting Spark configuration..."**
**echo "spark.sql.shuffle.partitions 100" >> /databricks/spark/conf/spark-defaults.conf**
Upload the set-spark-config.sh to your DBFS
In the Databricks navigate to the Admin Console / Global Init Scripts / ADD Script
Name the Script like for example Set Configuration provide the path like this /FileStore/tables/set-spark-config.sh Please refer to the screenshot.
Once you have created the init script, it will be executed on all clusters in the workspace. The spark.sql.shuffle.partitions configuration will be set to 100 for all Spark jobs running on these clusters.
Note: that global init scripts are executed at startup time, so any changes to the configuration will not take effect until the clusters are restarted.
Step 2: In the Databricks navigate to the Admin Console / Global Init Scripts / ADD Script Name the Script like for example Set Configuration01 In the Script area try for this
spark.sql.execution.arrow.pyspark.enabled true
Save and Enable the Script.
Note: This applies the configuration to all clusters and notebooks in the workspace.
Upvotes: 0