Reputation: 1153
I have been exploring the data lakehouse concept and Delta Lake. Some of its features seem really interesting. Right there on the project home page https://delta.io/ there is a diagram showing Delta Lake running on "your existing data lake" without any mention of Spark. Elsewhere it suggests that Delta Lake indeeds runs on top of Spark. So my question is, can it be run independently from Spark? Can I, for example, set up Delta Lake with S3 buckets for storage in Parquet format, schema validation etc, without using Spark in my architecture?
Upvotes: 9
Views: 1506
Reputation: 737
Yes, this is absolutely possible. We had built scalable data backend using this approach of Delta Lake, Glue data catalog, Amazon S3 and Amazon Athena. Amazon Athena can be used to query the data instead of Apache Spark.
Please refer to this blog that explains the same in detail.
Upvotes: 0
Reputation: 49724
Currently, you can use delta-rs to read and write to Delta Lake directly.
It support Rust and Python. Here is an example using Python:
You can install by pip install deltalake
or conda install -c conda-forge delta-spark
.
import pandas as pd
from deltalake.writer import write_deltalake
df = pd.DataFrame({"x": [1, 2, 3]})
write_deltalake("path/to/delta-tables/table1", df)
storage_options = {
"AWS_DEFAULT_REGION": "us-west-2",
"AWS_ACCESS_KEY_ID": "xxx",
"AWS_SECRET_ACCESS_KEY": "xxx",
"AWS_S3_ALLOW_UNSAFE_RENAME": "true",
}
write_deltalake(
"s3a://my-bucket/delta-tables/table1",
df,
mode="append",
storage_options=storage_options,
)
To remove AWS_S3_ALLOW_UNSAFE_RENAME
and concurrently write, it needs DynamoDB lock.
Follow this GitHub ticket for more updates regarding how to set up correctly.
Upvotes: 1
Reputation: 13425
You might keep an eye on this: https://github.com/delta-io/delta-rs
It's early and currently read-only, but worth watching as the project evolves.
Upvotes: 8
Reputation: 74619
tl;dr No
Delta Lake up to and including 0.8.0 is tightly integrated with Apache Spark so it's impossible to have Delta Lake without Spark.
Upvotes: -3