Ashika Umanga Umagiliya
Ashika Umanga Umagiliya

Reputation: 9158

Apache Spark 3 and backward compatibility?

We have several Spark applications running on production developed using Spark 2.4.1 (Scala 2.11.12). For couple of our new Spark jobs,we are considering utilizing features of DeltaLake.For this we need to use Spark 2.4.2 (or higher).

My questions are:

  1. If we upgrade our Spark cluster to 3.0.0, can our 2.4.1 applications still run on the new cluster (without recompile)?
  2. If we need to recompile our previous Spark jobs with Spark 3, are they source compatible or do they need any migration?

Upvotes: 3

Views: 8194

Answers (2)

Powers
Powers

Reputation: 19308

You can cross compile projects Spark 2.4 projects with Scala 2.11 and Scala 2.12. The Scala 2.12 JARs should generally work for Spark 3 applications. There are edge cases when using a Spark 2.4/Scala 2.12 JAR won't work properly on a Spark 3 cluster.

It's best to make a clean migration to Spark 3/Scala 2.12 and cut the cord with Spark 2/Scala 2.11.

Upgrading can be a big pain, especially if your project has a lot of dependencies. For example, suppose your project depends on spark-google-spreadsheets, a project that's not built with Scala 2.12. With this dependency, you won't be able to easily upgrade your project to Scala 2.12. You'll need to either compile spark-google-spreadsheets with Scala 2.12 yourself or drop the dependency. See here for more details on how to migrate to Spark 3.

Upvotes: 4

zsxwing
zsxwing

Reputation: 20826

There are some breaking changes in Spark 3.0.0, including source incompatible change and binary incompatible changes. See https://spark.apache.org/releases/spark-release-3-0-0.html. And there are also some source and binary incompatible changes between Scala 2.11 and 2.12, so you may also need to update codes because of Scala version change.

However, only do Delta Lake 0.7.0 and above require Spark 3.0.0. If upgrading to Spark 3.0.0 requires a lot of work, you can use Delta Lake 0.6.x or below. You just need to upgrade Spark to 2.4.2 or above in 2.4.x line. They should be source and binary compatible.

Upvotes: 4

Related Questions