Prashant
Prashant

Reputation: 772

Tungsten encoding in Spark SQL?

I am running a Spark application that has a series of Spark SQL statements that are executed one after the other. The SQL queries are quite complex and the application is working (generating output). These days, I am working towards improving the performance of processing within Spark.

Please suggest whether Tungsten encoding has to be enabled separately or it kicks in automatically while running Spark SQL?

I am using Cloudera 5.13 for my cluster (2 node).

Upvotes: 2

Views: 1909

Answers (2)

H Roy
H Roy

Reputation: 635

Tungsten became the default in Spark 1.5 and can be enabled in an earlier version by setting the spark.sql.tungsten.enabled = true. Even without Tungsten, SparkSQL uses a columnar storage format with Kyro serialization to minimize storage cost.

To make sure your code benefits as much as possible from Tungsten optimizations try to use the default Dataset API with Scala (instead of RDD).

Dataset brings the best of both worlds with a mix of relational (DataFrame) and functional (RDD) transformations. DataSet APIs are the most up to date and adds type-safety along with better error handling and far more readable unit tests.

Upvotes: 0

WestCoastProjects
WestCoastProjects

Reputation: 63201

It is enabled by default in spark 2.X (and maybe 1.6: but i'm not sure on that).

In any case you can do this

 spark.sql.tungsten.enabled=true

That can be enabled on the spark-submit as follows:

spark-submit  --conf spark.sql.tungsten.enabled=true

Tungsten should be enabled if you see a * next to the plan:

enter image description here

Also see: How to enable Tungsten optimization in Spark 2?

Upvotes: 3

Related Questions