Xiangyu
Xiangyu

Reputation: 844

sbt: using local jar without breaking the dependencies

I am building an application that uses Spark and Spark-mllib, the build.sbt states the dependencies as followings:

  3   libraryDependencies ++= Seq(
  4     "org.apache.spark" %% "spark-core" % "1.6.0" withSources() withJavadoc(),
  5     "org.apache.spark" %% "spark-mllib" % "1.6.0" withSources() withJavadoc()
  6     )

This works fine. Now I would like to change some code in mllib and recompile the application using sbt and here is what I did:

  1. Download the source code of spark-1.6.0, modify code in mllib and recompile it into a jar named spark-mllib_2.10-1.6.0.jar
  2. Put the aforementioned jar into the lib directory of the project.
  3. Also put the spark-core_2.10-1.6.0.jar into the lib directory of the project.
  4. Delete the libraryDependencies statement in the build.sbt file.
  5. run sbt clean package

However, this doesn't compile because of the missing dependencies which spark-core and spark-mllib need in order to run, the depencencies are managed by sbt automatically only if the statement of libraryDependencies is written in the file of build.sbt.

So I put the statement of libraryDependencies back in build.sbt hoping that sbt would solve the dependency issues and still use local spark-mllib instead of the one from the remote repository. However, running my application showed that it was not the case.

So I am wondering if there is a way to use my local spark-mllib jar without manually resolve the dependency issues?

UPDATE: I followed the first approach of Roberto Congiu's answer, and successfully built the package using following build.sbt:

  1 lazy val commonSettings = Seq(
  2   scalaVersion := "2.10.5",
  3   libraryDependencies ++= Seq(
  4     "org.apache.spark" %% "spark-core" % "1.6.0" withSources() withJavadoc(),
  5     "org.apache.spark" %% "spark-streaming" % "1.6.0" withSources() withJavadoc(),
  6     "org.apache.spark" %% "spark-sql" % "1.6.0" withSources() withJavadoc(),
  7     "org.scalanlp" %% "breeze" % "0.11.2"
  8   )
  9 )
 10 lazy val core = project.
 11   settings(commonSettings: _*).
 12   settings(
 13     name := "xSpark",
 14     version := "0.01"
 15   ) 
 16   
 17 lazy val example = project.
 18   settings(commonSettings: _*).
 19   settings(
 20     name := "xSparkExample",
 21     version := "0.01"
 22   ).
 23   dependsOn(core)

xSparkExample includes a KMeans example which calls xSpark, and xSpark calls the KMeans function in spark-mllib. This spark-mllib is a customized jar which I put in the directory of core/lib so that sbt can pick it up as a local dependency.

However, running my application still doesn't use the customized jar for some reason. I even use find . -name "spark-mllib_2.10-1.6.0.jar" to make sure there is no other jar existed on my system.

Upvotes: 2

Views: 3296

Answers (1)

Roberto Congiu
Roberto Congiu

Reputation: 5213

One way to do it is to have your custom mlib as a unmanaged dependency. Unmanaged dependencies are put in a directory and SBT will pick them up as they are, so you are responsible to also provide their dependencies. You can read about unmanaged dependencies here : http://www.scala-sbt.org/0.13/docs/Library-Dependencies.html

So, you can try the following:

  1. create a lib directory and add your custom mlib jar there. That's the default location for unmanaged libs and sbt will pick it up automatically
  2. in your build.sbt, remove the reference to mlib, and add all its dependencies, which are listed in the pom here: https://github.com/apache/spark/blob/master/mllib/pom.xml . You can skip the ones with test scope.

Another way to do it is to have your own maven repo (artifactory) where to put your custom artifacts, and have sbt pull from that repository first. This has the advantage that other people will be able to build the code and use your custom mlib library.

Upvotes: 0

Related Questions