Oleksandr Kovalov
Oleksandr Kovalov

Reputation: 43

Databricks job is canceled when azure-cosmos-spark maven library is installed

I've used com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.0.0 installed on cluster with runtime 8.3.x-scala2.12 for a long time. But it suddenly stopped working and databricks jobs that are run on cluster with this library are canceled. canceled databricks job

Cluster driver logs stderr file contains following error: ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8

I've tried to update library and cluster runtime versions and also installed jar library instead of maven, but it didn't help.

Now my cluster has following configuration:

{
"autoscale": {
    "min_workers": 1,
    "max_workers": 2
},
"cluster_name": "test-clstr002",
"spark_version": "9.1.x-scala2.12",
"spark_conf": {
    "spark.databricks.delta.preview.enabled": "true"
},
"azure_attributes": {
    "first_on_demand": 1,
    "availability": "ON_DEMAND_AZURE",
    "spot_bid_max_price": -1
},
"node_type_id": "Standard_F4s",
"driver_node_type_id": "Standard_F4s",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {},
"autotermination_minutes": 60,
"enable_elastic_disk": true,
"cluster_source": "API",
"init_scripts": [],
}

There is a screenshot of installed azure-cosmos-spark maven library azure-cosmos-spark maven library

Thank you for any help or suggestions!

Upvotes: 1

Views: 294

Answers (1)

Vamsi Bitra
Vamsi Bitra

Reputation: 2764

When you install a conflicting version of a library the cluster returns cancelled in a Python notebook, Maven dependency to your Spark cluster. Your app should be able to use the required connector libraries. But currently, if you specify the Cosmos DB-Spark connector’s Maven coordinates as a dependency for the cluster, you will get the error.

Solution:

Reference:

https://learn.microsoft.com/en-us/azure/databricks/data/data sources/azure/cosmosdb-connector

Upvotes: 0

Related Questions