Boon
Boon

Reputation: 1153

Delta Lake independent of Apache Spark?

I have been exploring the data lakehouse concept and Delta Lake. Some of its features seem really interesting. Right there on the project home page https://delta.io/ there is a diagram showing Delta Lake running on "your existing data lake" without any mention of Spark. Elsewhere it suggests that Delta Lake indeeds runs on top of Spark. So my question is, can it be run independently from Spark? Can I, for example, set up Delta Lake with S3 buckets for storage in Parquet format, schema validation etc, without using Spark in my architecture?

Upvotes: 9

Views: 1506

Answers (4)

Jayakrishnan GK
Jayakrishnan GK

Reputation: 737

Yes, this is absolutely possible. We had built scalable data backend using this approach of Delta Lake, Glue data catalog, Amazon S3 and Amazon Athena. Amazon Athena can be used to query the data instead of Apache Spark.

Please refer to this blog that explains the same in detail.

Upvotes: 0

Hongbo Miao
Hongbo Miao

Reputation: 49724

Currently, you can use delta-rs to read and write to Delta Lake directly.

It support Rust and Python. Here is an example using Python:

You can install by pip install deltalake or conda install -c conda-forge delta-spark.

import pandas as pd
from deltalake.writer import write_deltalake

df = pd.DataFrame({"x": [1, 2, 3]})
write_deltalake("path/to/delta-tables/table1", df)

Writing to S3

storage_options = {
    "AWS_DEFAULT_REGION": "us-west-2",
    "AWS_ACCESS_KEY_ID": "xxx",
    "AWS_SECRET_ACCESS_KEY": "xxx",
    "AWS_S3_ALLOW_UNSAFE_RENAME": "true",
}

write_deltalake(
    "s3a://my-bucket/delta-tables/table1",
    df,
    mode="append",
    storage_options=storage_options,
)

To remove AWS_S3_ALLOW_UNSAFE_RENAME and concurrently write, it needs DynamoDB lock.

Follow this GitHub ticket for more updates regarding how to set up correctly.

Upvotes: 1

Joshua Cook
Joshua Cook

Reputation: 13425

You might keep an eye on this: https://github.com/delta-io/delta-rs

It's early and currently read-only, but worth watching as the project evolves.

Upvotes: 8

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

tl;dr No


Delta Lake up to and including 0.8.0 is tightly integrated with Apache Spark so it's impossible to have Delta Lake without Spark.

Upvotes: -3

Related Questions