Best solution for version conflict in Spark program

Question

I have a Spark program which requires several dependencies.

One dependency: a.jar is version of 2.8 on the cluster a_2.8.jar, however, I need to use its version 2.9, a_2.9.jar.

Everytime I launch the program, the spark will automatically load a_2.8.jar from the cluster, instead of load a_2.9.jar, even I have submitted this jar by --jars a_2.9.jar.

I tried to use spark.executor.userClassPathFirst setting, but there is another problem. There is a "secret" jar file, say "b.jar", in my userClassPath that doesn't works with the cluster, and there are so many dependencies, I don't know which jar doesn't works.

To sum up:

If I use cluster default class path, a.jar will conflict.

If I use userClassPathFirst, b.jar will conflict. ( I don't know which b.jar)

I wish someone could advise me, what is the best solution here, to minimize the work.

Ramesh Maharjan · Accepted Answer

Uber Jar creation by using shade plugin can be your solution. Uber jar is collecting all the dependent jars in your packaged jar so that we don't have conflict. We can relocate/rename a conflicting jar with shade plugin. There are more advantages. More information can be found here and here

Best solution for version conflict in Spark program

Answers (2)

Related Questions