ebonnal
ebonnal

Reputation: 1167

How does Delta Lake (deltalake) guarantee ACID transactions?

What mechanisms does Delta Lake use to ensure the atomicity, consistency, isolation, and durability of transactions initiated by user operations on a DeltaTable?

Upvotes: 3

Views: 852

Answers (1)

ebonnal
ebonnal

Reputation: 1167

0. the DeltaLog

Deltalog = Delta Lake's transaction log.

The deltalog is a collection of ordered json files. It acts as a single source of truth giving to users access to the last version of a DeltaTable's state.

1. Atomicity

  • Delta Lake breaks down every operation performed by an user into commits, themselves composed of actions.
  • A commit is recorded in the deltalog only once each of its actions has successfully completed (else it is reverted and restarted or an error is thrown), ensuring its atomicity.

2. Consistency

The consistency of a DeltaTable is guaranteed by their strong schema checking.

3. Isolation

Concurrency of commits is managed to ensure their isolation. An optimistic concurrency control is applied:

  • When a commit execution starts, the thread snapshots the current deltalog.
  • When the commit actions have completed, the thread checks if the Deltalog has been updated by another one in the meantime:
    • If not it records the commit in the deltalog
    • Else it updates its DeltaTable view and attempts again to register the commit, after a step of reprocessing if needed.

4. Durability

Commits containing actions that mutate the DeltaTable's data need to finish their writes/deletions on underlying Parquet files (stored on the filesystem) to be considered as successfully completed, making them durable.


Further readings:

Diving Into Delta Lake: Unpacking The Transaction Log

ACID properties

Upvotes: 2

Related Questions