Reputation: 741
I'm trying to connect and read all my csv files from s3 bucket with databricks pyspark. When I am using some bucket that I have admin access , it works without error
data_path = 's3://mydata_path_with_adminaccess/'
But when I tried to connect to some bucket which needs ACCESS_KEY_ID and SECRET_ACCESS_KEY , It will not work and access is denied :
I tried :
data_path = 's3://mydata_path_without_adminaccess/'
AWS_ACCESS_KEY_ID='my key'
AWS_SECRET_ACCESS_KEY='my key'
and:
data_path = ='s3://<MY_ACCESS_KEY_ID>:<My_SECRET_ACCESS_KEY>@mydata_path_without_adminaccess
Upvotes: 2
Views: 4942
Reputation: 41
To connect S3 with databricks using access-key, you can simply mount S3 on databricks. It creates a pointer to your S3 bucket in databricks. If you already have a secret stored in databricks, Retrieve it as below:
access_key = dbutils.secrets.get(scope = "aws", key = "aws-access-key")
secret_key = dbutils.secrets.get(scope = "aws", key = "aws-secret-key")
If you do not have a secret stored in Databricks, try below piece of code to avoid "Secret does not exist with scope" error
access_key = "your-access-key"
secret_key = "your-secret-key"
#Mount bucket on databricks
encoded_secret_key = secret_key.replace("/", "%2F")
aws_bucket_name = "s3-bucket-name"
mount_name = "Mount-Name"
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)
display(dbutils.fs.ls("/mnt/%s" % mount_name))
Access your S3 data as below
mount_name = "mount-name"
file_name="file name "
df = spark.read.text("/mnt/%s/%s" % (mount_name , file_name))
df.show()
Upvotes: 2
Reputation: 5296
I am not really sure if you have tried mounting your bucket in databricks using secret and keys , but it's worth trying:
Here is the code for the same:
ACCESS_KEY = dbutils.secrets.get(scope = "aws", key = "aws-access-key")
SECRET_KEY = dbutils.secrets.get(scope = "aws", key = "aws-secret-key")
ENCODED_SECRET_KEY = SECRET_KEY.replace("/", "%2F")
AWS_BUCKET_NAME = "<aws-bucket-name>"
MOUNT_NAME = "<mount-name>"
dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s" % MOUNT_NAME))
and then you can access files in your S3 bucket as if they were local files:
df = spark.read.text("/mnt/%s/...." % MOUNT_NAME)
Additional reference:
https://docs.databricks.com/data/data-sources/aws/amazon-s3.html
Hope it helps.
Upvotes: 4