Reputation: 61
I recently started working with spark and was eager to know if I have to perform queries which would be better spark sql or databricks sql and why?
Upvotes: 2
Views: 4513
Reputation: 449
In basic nut shell, you can download Apache Spark with pre-built Hadoop. You need to download the package from free. Additionally you can add Delta Lake and other third-party software.
Now Databricks is platform where you have to pay, it contains Apache SPARK + Delta Lake + many built in extras.
As expected, performance and SQL dialect between Hadoop and Delta Lake are different since they are different databases.
You can install Delta Lake in Apache Spark so you compare Hadoop vs Delta Lake
Upvotes: 0
Reputation: 87069
We need to distinguish two things here:
As was mentioned in this answer, Databricks SQL as language is primarily based on Spark SQL with some additions specific to Delta Lake tables (like CREATE TABLE CLONE
, ...). ANSI compatibility in Databricks SQL is controlled with ANSI_MODE setting, and will be enabled by default in the future.
But when it comes to the execution, Databricks SQL is different from Spark SQL engine because it uses Photon engine heavily optimized for modern hardware and BI/DW workloads. With Photon you can get significant speedup (2-3x) compared to standard Spark SQL engine on the complex queries that process a lot of data.
Upvotes: 3