Delta Table transactional guarantees when loading using Autoloader from AWS S3 to Azure Datalake

Trying to use autoloader where AWS S3 is source and Delta lake is in Azure Datalake Gen. When I am trying to read files it gives me following error

Writing to Delta table on AWS from non-AWS is unsafe in terms of providing transactional guarantees. If you can guarantee that no one else will be concurrently modifying the same Delta table, you may turn this check off by setting the SparkConf: "spark.databricks.delta.logStore.crossCloud.fatal" to false when launching your cluster.

Tried setting up settings at cluster level and it works fine. My question is, is there any way we can ensure transactional guarantee wile loading data from AWS3 to Azure Datalake (Datalake is backend storage for our Delta Lake). We don't want to set "spark.databricks.delta.logStore.crossCloud.fatal" at Cluster level. Will there be any issue if we do and will it be a good solution for a production ETL pipeline?

Upvotes: 2

Views: 738

Answers (1)

Alex Ott
Alex Ott

Reputation: 87069

This warning appears when Databricks detects that you're doing the multicloud work. But this warning is for case when you're writing into AWS S3 using Delta, because AWS doesn't have atomic write operation (like, put-if-absent), so it requires some kind of coordinator process that is available only on AWS.

But in your case you can ignore this message because you're just reading from AWS S3, and writing into Delta that is on Azure Datalake.

Upvotes: 1

Related Questions