Reputation: 6206
I have a Spark program which requires several dependencies.
One dependency: a.jar is version of 2.8 on the cluster a_2.8.jar
, however, I need to use its version 2.9, a_2.9.jar
.
Everytime I launch the program, the spark will automatically load a_2.8.jar from the cluster, instead of load a_2.9.jar
, even I have submitted this jar by --jars a_2.9.jar
.
I tried to use spark.executor.userClassPathFirst
setting, but there is another problem. There is a "secret" jar file, say "b.jar
", in my userClassPath that doesn't works with the cluster, and there are so many dependencies, I don't know which jar doesn't works.
To sum up:
If I use cluster default class path, a.jar
will conflict.
If I use userClassPathFirst
, b.jar
will conflict. ( I don't know which b.jar)
I wish someone could advise me, what is the best solution here, to minimize the work.
Upvotes: 3
Views: 2490
Reputation: 41957
Uber Jar
creation by using shade plugin
can be your solution. Uber jar
is collecting all the dependent jars in your packaged jar so that we don't have conflict. We can relocate/rename a conflicting jar with shade plugin
. There are more advantages. More information can be found here and here
Upvotes: 2
Reputation: 1131
The best solution is IMO to:
Get the dependency tree with your package manager or any other tool you want use. For example in maven you could use mvn dependency:tree
see here to double check which dependencies could potentially cause the class path errors and remove them by excluding them in your build file definition like it is pointed out here.
Then, rebuilding your JAR and try it again.
Upvotes: 1