Morshed
Morshed

Reputation: 225

Job failed using spark-submit with parameters when job runs in Azure databricks.. INIT_SCRIPT_FAILURE (CLIENT_ERROR)

I created a simple netcore3.1 project and ran locally successfully. I wanted to test the Microsoft netcore3.1 application into azure databricks cluster to run job. I followed the instruction as per Microsoft documentation, but job failed.

https://learn.microsoft.com/en-us/previous-versions/dotnet/spark/tutorials/databricks-deployment

SparkSession spark = SparkSession
.Builder()
.AppName("HelloWorldSparkApp")
.GetOrCreate();

 // Create a DataFrame with a single row and single column  
 DataFrame df = spark.Sql("SELECT 'Hello, World!' AS message");

 // Show the DataFrame  
 df.Show();

 // Stop the Spark session  
 spark.Stop();

Cluster configuration

image

copied db-init.sh to workspace->shared folder because there is no DBFS option.

Job configuration

Publish application:

Databricks dbfs/spark-dotnet folder content as per ms documentation -db-init.sh -install-worker.sh -microsoft-spark-3-2_2.12-2.1.1.jar -Microsoft.Spark.Worker.netcoreapp3.1.linux-x64-2.1.1.tar.gz -HelloSparkCore31.zip

Job creation is okey but got exception when job starts.

Exception as per job output

Cluster '0901-204609-m9yiukve' was terminated. Reason: INIT_SCRIPT_FAILURE (CLIENT_ERROR). Parameters: instance_id:93d671e9f0884221b689a09b125d2655, databricks_error_message:Cluster scoped init script /Shared/db-init.sh failed: Script exit status is non-zero.

image

I am in learning stage about databricks. I searched google a lot but could not resolve.

Any kind of help or hints would be greatly appreciated.

Upvotes: 0

Views: 84

Answers (1)

JayashankarGS
JayashankarGS

Reputation: 8160

First make sure you altered the db-init.sh appropriately, whatever the folders need to be created and using supported versions.

That is install-worker.sh should be in /dbfs/spark-dotnet as per the script but the using scripts in dbfs location is depreciated so please use different spark-dotnet root folder.

Also use latest DOTNET_SPARK_RELEASE which is Release .NET for Apache Spark v2.1.1 · dotnet/spark · GitHub

So, alter your DOTNET_SPARK_RELEASE in script.

DOTNET_SPARK_RELEASE=https://github.com/dotnet/spark/releases/download/v2.1.1/Microsoft.Spark.Worker.netcoreapp3.1.linux-x64-2.1.1.tar.gz

Also check the logs in dbfs:/cluster-logs/ for more about executing of this scripts.

Upvotes: 0

Related Questions