Nandha
Nandha

Reputation: 882

Azure databricks: Installing maven libraries to cluster through API causes error (Library resolution failed. Cause: java.lang.RuntimeException)

I am trying to install some maven libraries to existing azure data bricks' cluster/newly created cluster through API from python.

Cluster details:

spark_submit_packages = "org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.4.3," \
                        "com.databricks:spark-redshift_2.11:3.0.0-preview1," \
                        "org.postgresql:postgresql:9.3-1103-jdbc3," \
                        "com.amazonaws:aws-java-sdk:1.11.98," \
                        "com.amazonaws:aws-java-sdk-core:1.11.98," \
                        "com.amazonaws:aws-java-sdk-sns:1.11.98," \
                        "org.apache.hadoop:hadoop-aws:2.7.3," \
                        "com.amazonaws:aws-java-sdk-s3:1.11.98," \
                        "com.databricks:spark-avro_2.11:4.0.0," \
                        "com.microsoft.azure:azure-data-lake-store-sdk:2.0.11," \
                        "org.apache.hadoop:hadoop-azure-datalake:3.0.0-alpha2," \
                        "com.microsoft.azure:azure-storage:3.1.0," \
                        "org.apache.hadoop:hadoop-azure:2.7.2"

    install_lib_url = "https://<region>.azuredatabricks.net/api/2.0/libraries/install"
    packages = spark_submit_packages.split(",")
    maven_packages = []
    for pack in packages:
        maven_packages.append({"maven": {"coordinates": pack}})

    headers = {"Authorization": "Bearer {}".format(TOKEN)}
    headers['Content-type'] = 'application/json'

    data = {
        "cluster_id": cluster_id,
        "libraries": maven_packages
    }
    
    res = requests.post(install_lib_url, headers=headers, data=json.dumps(data))
    _response = res.json()
    print(json.dumps(_response))

The response is empty json which is as expected.
But sometimes this api call results in the following error in the UI and the library installation is failed,

Library resolution failed. Cause: java.lang.RuntimeException: commons-httpclient:commons-httpclient download failed.
    at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$5(MavenLibraryResolver.scala:253)
    at scala.collection.MapLike.getOrElse(MapLike.scala:131)
    at scala.collection.MapLike.getOrElse$(MapLike.scala:129)
    at scala.collection.AbstractMap.getOrElse(Map.scala:63)
    at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$4(MavenLibraryResolver.scala:253)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.databricks.libraries.server.MavenInstaller.resolveDependencyPaths(MavenLibraryResolver.scala:249)
    at com.databricks.libraries.server.MavenInstaller.doDownloadMavenPackages(MavenLibraryResolver.scala:455)
    at com.databricks.libraries.server.MavenInstaller.$anonfun$downloadMavenPackages$2(MavenLibraryResolver.scala:381)
    at com.databricks.backend.common.util.FileUtils$.withTemporaryDirectory(FileUtils.scala:431)
    at com.databricks.libraries.server.MavenInstaller.$anonfun$downloadMavenPackages$1(MavenLibraryResolver.scala:380)
    at com.databricks.logging.UsageLogging.$anonfun$recordOperation$4(UsageLogging.scala:417)
    at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231)
    at com.databricks.libraries.server.MavenInstaller.withAttributionContext(MavenLibraryResolver.scala:57)
    at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276)
    at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:269)
    at com.databricks.libraries.server.MavenInstaller.withAttributionTags(MavenLibraryResolver.scala:57)
    at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:398)
    at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:337)
    at com.databricks.libraries.server.MavenInstaller.recordOperation(MavenLibraryResolver.scala:57)
    at com.databricks.libraries.server.MavenInstaller.downloadMavenPackages(MavenLibraryResolver.scala:379)
    at com.databricks.libraries.server.MavenInstaller.downloadMavenPackagesWithRetry(MavenLibraryResolver.scala:137)
    at com.databricks.libraries.server.MavenInstaller.resolveMavenPackages(MavenLibraryResolver.scala:113)
    at com.databricks.libraries.server.MavenLibraryResolver.resolve(MavenLibraryResolver.scala:44)
    at com.databricks.libraries.server.ManagedLibraryManager$GenericManagedLibraryResolver.resolve(ManagedLibraryManager.scala:263)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl.$anonfun$resolvePrimitives$1(ManagedLibraryManagerImpl.scala:193)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl.$anonfun$resolvePrimitives$1$adapted(ManagedLibraryManagerImpl.scala:188)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl.resolvePrimitives(ManagedLibraryManagerImpl.scala:188)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl$ClusterStatus.installLibs(ManagedLibraryManagerImpl.scala:772)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl$InstallLibTask$1.run(ManagedLibraryManagerImpl.scala:473)
    at com.databricks.threading.NamedExecutor$$anon$1.$anonfun$run$1(NamedExecutor.scala:317)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231)
    at com.databricks.threading.NamedExecutor.withAttributionContext(NamedExecutor.scala:256)
    at com.databricks.threading.NamedExecutor$$anon$1.run(NamedExecutor.scala:317)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Is it due to installing multiple maven libraries in a single API? (But there we need to give a list to the API :| )

Cluster_libraries_screenshot

EDIT: This issue occurs while restarting the cluster too. Let's say that i have manually installed some 10 maven libraries to a cluster. All the installations are successful. But when i restart the cluster, even these successful installations become failed.

Upvotes: 2

Views: 4826

Answers (1)

Nandha
Nandha

Reputation: 882

Got the following response from Azure support team:

Seems there is a problem with a particular maven jar(org.apache.hadoop:hadoop-azure-datalake:3.0.0-alpha2)

Work around:
1. Download the jar from maven repository.
2. Upload it to dbfs.
3. Use the jar from dbfs for creating library.

Upvotes: 3

Related Questions