Reputation: 85
From what I've read so far I have come to the following conclusion. A dedicated SQL Pool could do everything a Delta Lake could like ACID transactions, scaling capabilities, handle batch and streaming data etc. so what is are the differences between using a Delta Lake over a dedicated SQL Pool?
Upvotes: 0
Views: 1384
Reputation: 69
Both Dedicated SQL Pool and Delta Lake implements ACID transactions, Schema Enforcement and Scaling capabilities. But there are lot more features in Delta Lake which are NOT in Dedicated SQL pool like the below.
Schema evolution in Delta Lake is a feature that allows users to easily change a table's current schema to accommodate data that is changing over time.
Time travel is getting to the previous version. Time travel can be performed using Version number and the Timestamp.
Versioning : new records got inserted or updated, versions will be maintained. Number of versions is equal to the number of Transaction log files created.
UPSERT (Merge): If any matching row found, it will Update the records If no matching rows found, this will insert that as new row.
Optimize: Used to combine multiple small parquet file into a single large Parquet file and reduce storage costs.
Vacuum commend is used to delete the Parquet files which are not in the latest state in the transaction log.
Dedicated SQL pool is the home for Structured data, whereas Delta Lake is a single home for structured, Semi-structured and unstructured data.
Hope these differences helps you to decide Delta Lake Vs Dedicated SQL Pool.
Upvotes: 0
Reputation: 7758
There are a number of differences.
Dedicated SQL Pool is an "always-on" resource. You pay 24x7 for the service to be live whether you are using it or not and it is an expensive option. Delta consumption methods tend to be Consumption based, so you only pay for the compute you need.
Dedicated SQL Pool is SQL focused. Data is stored in physical Tables, which need a proper architecture to perform well (Column store vs Clustered indexes, additional indexes, Partition and Distribution schemes, etc.) Overall, Dedicated SQL Pool requires a lot more know-how and management.
Delta is really just a Storage format and can be natively consumed by Notebooks. In the case of Synapse (and now Fabric), Delta can also be queried with SQL, so it is more flexible than pure SQL.
Delta Lake (or Lake tables) are really just metadata constructs over the Delta files in Storage. In Fabric, both Lakehouse and Warehouse are backed by Delta. Lakehouse is a Notebook-centric environment (that also supports SQL endpoints) similar to the Lake Tables in Synapse and Warehouse is a pure SQL environment like Dedicated SQL Pool, but without all the extra overhead. Both costs are managed by the same Fabric capacities.
Upvotes: 0