Vijayant
Vijayant

Reputation: 752

How to include python and Scala files all together in jar file using sbt?

Goal:

Build a single jar with scala and python files and supply this jar to pyspark and be able to call both scala and python files. Main execution will be in python files which will use scala libraries internally using py4j.

How to include python files/package in jar file along with scala files using SBT ?

Project structure (open to change to whatever works)

parent_project
|
|-- child_project
    |
    |-- src
        |
        |-- main
            |
            |-- scala
                |
                |-- com.my_org.child_project
                    |
                    |-- s_file_1.scala
                    |-- s_file_2.scala
            |-- python
                |
                |-- foo
                    |
                    |-- p_file_1.py
                    |-- p_file_2.py
    |-- build.sbt                      -- for child project
|-- build.sbt                          -- for parent project

Sample build.sbt (for child project)

name := "child_project"
version := "1.0.0"
scalaVersion := "2.11.1"
val sparkVersion = "2.4.4"

lazy val dependencies = new {}

libraryDependencies ++= Seq()

Sample build.sbt (for parent project)

lazy val child_project = project.in(file("parent_project/child_project"))
  .dependsOn(parent % "provided->provided;compile->compile;test->test;runtime->runtime")
  .settings(
    name := "child_project",
    organization := "com.my_org",
    unmanagedSourceDirectories in Compile += file("/parent_project/child_project/src/main/python"),
    includeFilter in (Compile, unmanagedSources) := "*.scala" || "*.java" || "*.py"
    assemblySettings
  )

SBT Version = 0.13.16

SBT command for building jar

"project child_project" assembly

Specific questions:

  1. Is it possible to include package both python and scala code in a single jar ?
  2. Is it possible to supply this jar to pyspark and access both python and scala files out of it ?
  3. Any suggestion / workaround / better options for achieving the goal ?

Upvotes: 1

Views: 1078

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74709

A solution that immediately comes to my mind would be to place the .py files under main/resources directory. That seems more like a hack, but could be what you want perhaps (esp. for python files).

A much better solution would be to define main/python as a source directory as described in Add an additional source directory:

sbt collects sources from unmanagedSourceDirectories, which by default consists of scalaSource and javaSource. Add a directory to unmanagedSourceDirectories in the appropriate configuration to add a source directory. For example, to add extra-src to be an additional directory containing main sources,

Compile / unmanagedSourceDirectories += baseDirectory.value / "extra-src"

That would be the following in your build.sbt:

Compile / unmanagedSourceDirectories += baseDirectory.value / "python"

Upvotes: 0

Related Questions