Reputation: 13
I have published a dataset in Azure Data Factory but there is no way I can access the dataset in the databricks.
The dataset was published from a service that is connected to AWS S3. Here's the picture.
I have tried reading the documentation on Azure but most of them leads to suggesting dropping this data into Azure Data Lake Storage. Is that the only way to access the data in the databricks?
Please provide any good documentation links.
Upvotes: 0
Views: 117
Reputation: 3145
Configure Azure Databricks to access data directly from AWS S3 by setting up the necessary credentials and configurations.
Know more about external locations and storage credentials and Create an IAM role
Access S3 buckets with Unity Catalog volumes or external locations
The below is the code will help you Read & Write S3 bucket:
Reading the file from S3:
dbutils.fs.ls("s3://my-bucket/external-location/path/to/data")
spark.read.format("parquet").load("s3://my-bucket/external-location/path/to/data")
spark.sql("SELECT * FROM parquet.`s3://my-bucket/external-location/path/to/data`")
Writing the file:
dbutils.fs.mv("s3://my-bucket/external-location/path/to/data", "s3://my-bucket/external-location/path/to/new-location")
df.write.format("parquet").save("s3://my-bucket/external-location/path/to/new-location")
Upvotes: 0