khÜs h
khÜs h

Reputation: 61

Spark SQL vs Databricks SQL

I recently started working with spark and was eager to know if I have to perform queries which would be better spark sql or databricks sql and why?

Upvotes: 2

Views: 4513

Answers (2)

Luis Estrada
Luis Estrada

Reputation: 449

In basic nut shell, you can download Apache Spark with pre-built Hadoop. You need to download the package from free. Additionally you can add Delta Lake and other third-party software. enter image description here

Now Databricks is platform where you have to pay, it contains Apache SPARK + Delta Lake + many built in extras. enter image description here

As expected, performance and SQL dialect between Hadoop and Delta Lake are different since they are different databases.

You can install Delta Lake in Apache Spark so you compare Hadoop vs Delta Lake

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87069

We need to distinguish two things here:

  • Spark SQL as a dialect of the SQL language. Originally started as Shark & Hive on Spark projects (blog), it's now going close to ANSI SQL.
  • Spark SQL as execution engine inside Spark.

As was mentioned in this answer, Databricks SQL as language is primarily based on Spark SQL with some additions specific to Delta Lake tables (like CREATE TABLE CLONE, ...). ANSI compatibility in Databricks SQL is controlled with ANSI_MODE setting, and will be enabled by default in the future.

But when it comes to the execution, Databricks SQL is different from Spark SQL engine because it uses Photon engine heavily optimized for modern hardware and BI/DW workloads. With Photon you can get significant speedup (2-3x) compared to standard Spark SQL engine on the complex queries that process a lot of data.

Upvotes: 3

Related Questions