Reputation: 1155
I am trying to run a fat jar through spark-submit
on EMR. I am running into a problem related to package dependencies. This project depends on google adwords
library which I have included in build.sbt
. The problem is that google adwords
library internally depends on a package called commons-configuration
version 1.10 but when I run this jar on EMR through spark-submit
which runs via yarn scheduler
the version 1.6 of this package (commons-configuration
) is used as it is part of the CLASSPATH on the EMR cluster. I get the following error
java.lang.NoSuchMethodError: org.apache.commons.configuration.MapConfiguration
I have tried passing the dependency jar explicitly using option --jars
of spark-submit
spark-submit --name my-awesome-spark-job --deploy-mode cluster --class package.path.to.my.Main --jars s3://jar-bucket/jars/commons-configuration-1.10.jar s3://code-bucket/jars/spark-code.jar
Doing this still gives me the same error as the package of older version from CLASSPATH is being used not matter what.
I would like to force my jar to include the dependency inside the fat jar and use them explicitly for certain libraries e.g google adwords
library here. Thanks.
Upvotes: 0
Views: 1085
Reputation: 497
You could try to shade the dependencies that you are using and that have an older version available on cluster.
What do you use to build the jar? I've used this strategy with sbt https://github.com/sbt/sbt-assembly#shading
But there is also a shade plugin for maven: https://maven.apache.org/plugins/maven-shade-plugin/
Upvotes: 1