Paul Draper
Paul Draper

Reputation: 83225

What determines recompilation in SBT?

Background

I have an SBT project will a large number of subprojects. After VCS changes (pulling, switching branches, etc.), it can take a long time to recompile. I want to reduce the time by employing a strategy to have a distributed cache on per-subproject basis. Buck has a good exaplanation for this sort of strategy:

A build rule knows all of the inputs that can affect its output, and therefore it can combine that information into a hash that represents the total input.

When Buck begins to build a build rule, the first thing it does is compute the cache key for the rule. If there is a hit in any of the caches specified in .buckconfig, then it will fetch the rule's output from the cache instead of building the rule locally.

If you are using some sort of continuous integration (CI) system, you will likely want your CI builds to populate a cache that can be read by your local builds. That way, when a developer syncs to a revision that has already been built on your CI system, running buck build should not build anything locally, as all outputs should be able to be pulled from the cache.

So I want to be able to populate target when a cache key is met.


Question

The problem is, I can't seem to figure out when SBT wants to recompile.

build.properties

sbt.version=0.13.7

src/main/scala/Foo.scala

class Foo {}

First compile:

$ sbt compile
[info] Compiling 1 Scala source
[success]

Changing the source triggers recompilation

$ echo >> src/main/scala/Foo.scala
$ sbt compile
[info] Compiling 1 Scala source
[success]

Changing the source timestamp doesn't trigger recompilation

$ touch src/main/scala/Foo.scala
$ sbt compile
[success]

Changing the target timestamp triggers recompilation

$ touch target/scala_2.10/classes/Foo.class
$ sbt compile
[info] Compiling 1 Scala source
[success]

How does SBT know when the targets don't match the sources? (And can I place targets in such a way that SBT accepts them?)

Upvotes: 1

Views: 511

Answers (1)

jsuereth
jsuereth

Reputation: 5624

The answer is a bit sophisticated. We take some approaches similar to buck, but without the forethought of having a global distributable cache.

Basically, here's the general gist:

  1. We look at hashes of the files. I believe this is configurable, but just touching a file may not be enough to mark it as changed. You may additionally have to add a space or something which affects the SHA-1 of the file.
  2. We look at the API exposed by the file. This is done using, what we call, the "name hasher". This essentially tries to construct a hash of names used in a source file to determine if changes in one source file affect another.
  3. We check the generated binaries to see if they have changed since we last generated them. If so, we throw away what we knew about what we did and assume we must do it all over again to be safe (i.e. some external process has been playing in our sandbox and we don't know if there's something hiding below the surface).
    1. We check to see if any JAR dependencies have been modified, or if there are any new ones
    2. We check to see if any compiler flags have changed (e.g. -optimise requires recompile).

The third point recently got us into trouble with external bytecode manipulation libraries. We recently expanded the default build to accomodate these into the build before we construct our hashes/cached information about what we did, see: https://github.com/sbt/sbt/pull/1714

Hopefully that helps clarify. There may be more going on in our checks. Most of it is in the compile/ directory of sbt.

Upvotes: 1

Related Questions