Reputation: 1214
In our project we're using com.typesafe:config in version 1.3.4. According to the latest release notes, this dependency is already provided by Databricks on the cluster, but in a very old version (1.2.1). How can I overwrite the provided dependency with our own version?
We use maven, in our dependencies I have
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.4</version>
</dependency>
Our created jar file should therefore contain the newer version.
I created a Job by uploading the jar file. The Job fails because it can't find a method that was added after version 1.2.1, so it looks like the library we provided gets overwritten by the older version on the cluster.
Upvotes: 8
Views: 1746
Reputation: 51
Databricks supports the initialization script (cluster scope or global scope) so that you can install/remove any dependency. The details are at https://docs.databricks.com/clusters/init-scripts.html.
In your initialization script, you can remove the default jar file locates at databricks driver/executor classpath /databricks/jars/
and add the expected one there.
Upvotes: 0
Reputation: 1214
We solved it in the end by utilizing Sparks ChildFirstURLClassLoader. The project is open source so you can check it out yourself here and the usage of the method here.
But for reference, here is the method in its entirety. You need to provide a Seq of jars that you want to override with your own, in our case it's the typesafe config.
def getChildFirstClassLoader(jars: Seq[String]): ChildFirstURLClassLoader = {
val initialLoader = getClass.getClassLoader.asInstanceOf[URLClassLoader]
@tailrec
def collectUrls(clazz: ClassLoader, acc: Map[String, URL]): Map[String, URL] = {
val urlsAcc: Map[String, URL] = acc++
// add urls on this level to accumulator
clazz.asInstanceOf[URLClassLoader].getURLs
.map( url => (url.getFile.split(Environment.defaultPathSeparator).last, url))
.filter{ case (name, url) => jars.contains(name)}
.toMap
// check if any jars without URL are left
val jarMissing = jars.exists(jar => urlsAcc.get(jar).isEmpty)
// return accumulated if there is no parent left or no jars are missing anymore
if (clazz.getParent == null || !jarMissing) urlsAcc else collectUrls(clazz.getParent, urlsAcc)
}
// search classpath hierarchy until all jars are found or we have reached the top
val urlsMap = collectUrls(initialLoader, Map())
// check if everything found
val jarsNotFound = jars.filter( jar => urlsMap.get(jar).isEmpty)
if (jarsNotFound.nonEmpty) {
logger.info(s"""available jars are ${initialLoader.getURLs.mkString(", ")} (not including parent classpaths)""")
throw ConfigurationException(s"""jars ${jarsNotFound.mkString(", ")} not found in parent class loaders classpath. Cannot initialize ChildFirstURLClassLoader.""")
}
// create child-first classloader
new ChildFirstURLClassLoader(urlsMap.values.toArray, initialLoader)
}
As you can see, it also has some logic to abort if the jar files you specified do not exist in the classpath.
Upvotes: 3
Reputation: 339
In the end we have fixed this by shading the relevant classes, by adding the following to our build.sbt
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.typesafe.config.**" -> "shadedSparkConfigForSpark.@1").inAll
)
Upvotes: 3