Reputation: 147
Has anyone tried connecting superset to AWS athena ?
I was able to connect to redshift by using SQLAlchemy URI: postgresql://username:[email protected]:port/dbname
but I am having hard time connecting to AWS athena. AWS has JDBC driver (http://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html) but I can't figure out how to use it with superset. Any example ?
Upvotes: 6
Views: 13634
Reputation: 2379
After lot of hustle managed to create connection string which works, note that all key & s3 path needs to be encoded, below format works for me
awsathena+rest://{encoded aws_access_key_id}:{encoded aws_secret_access_key}@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={encoded s3_staging_dir}
You can use below code to generate connection string, save it to file & run
from urllib.parse import quote_plus
conn_str = "awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com:443/"\
"{schema_name}?s3_staging_dir={s3_staging_dir}"
print(conn_str.format(
aws_access_key_id=quote_plus("{aws_access_key_id}"),
aws_secret_access_key=quote_plus("{aws_secret_access_key}"),
region_name="{region_name}",
schema_name="{schema_name}",
s3_staging_dir=quote_plus("{s3_staging_dir}")))
Upvotes: 0
Reputation: 111
Official guidance from Superset:
https://superset.apache.org/docs/databases/athena
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
You need to have some tweaks by yourself. This worked for me after many hours of reading posts on 2021-12-12:
awsathena+rest://{secret id}:{secret access key}@athena.ap-southeast-1.amazonaws.com/test?s3_staging_dir=s3://{your bucket where Athena query result is stored}/test/&work_group=primary
Note that in my example:
"schema_name = test":
You must see a database named "test" under Athena \ Query Editor \ Database at this point. It is created in Glue Console \ Data Catalog \ Database with a crawler or manual.
s3://{your bucket}/{path if needed}/test:
you need to go to Athena \ Workgroups, select a workgroup and check the setting if it turned on the "Query result location" or not. In my case, the name of the workgroup is "primary", the query result of the "test" database will be stored in s3://{your bucket where Athena query result is stored}/test/
Make sure you have installed these under Python Virtual Environment:
pip install "PyAthenaJDBC>1.0.9"
pip install "PyAthena>1.2.0"
See how to create Superset under Python Env:
https://superset.apache.org/docs/installation/installing-superset-from-scratch
Security Group:
(I got this instruction from here: https://www.youtube.com/watch?v=vzuPQPRcT-0)
I build Superset on the EC2 Instance. Therefore, you need to check out the security group setting. Because it relates to EC2 service, Athena service, and the website which Superset is running in UI.
In my case, I have turned on all these settings to make sure it can run the first time. Then you can narrow down the setting later.
Custom TCP - TCP - 8088 - ::/0 ; 0.0.0.0/0
HTTP - TCP - 80 - ::/0 ; 0.0.0.0/0
SSH - TCP - 22 - ::/0 ; 0.0.0.0/0
Custom ICMP - IPv4 - Echo Request - N/A - 0.0.0.0/0
All ICMP - IPv6 - IPv6 ICMP - All - ::/0
All ICMP - IPv6 - IPv6 ICMP - All - 0.0.0.0/0
Upvotes: 0
Reputation: 39
We tried installing superset with PyAthena JDBC & REST. Our experience with PyAthena (REST) is far better than PyAthenaJDBC, would recommend to use same in production.
Install PyAthena (pure python library, java is not needed)
pip install "PyAthena>1.2.0"
Access database by creating connection url
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
I found this article, a good guide on deploying superset.
Upvotes: 3
Reputation: 1101
1) Make sure you have the PyAthenaJDBC pkg. installed:
pip install "PyAthenaJDBC>1.0.9"
2) Restart superset
3) Download the JDBC driver: from aws driver download I used the AthenaJDBC41-2.0.6.jar version
Example driver download url, Note: I saved my driver in /drivers/
wget https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.6/AthenaJDBC41_2.0.6.jar
4) Add the data-source to superset:
awsathena+jdbc://AWS_KEY:[email protected]/mydb?s3_staging_dir=s3://path/to/my/data/&driver_path=/drivers/AthenaJDBC41_2.0.6.jar
Note: If superset is running on ECS / EC2 you can assign an IAM role, and remove the AWS KEY/SECRET from the URI, Example raw connection URI below:
awsathena+jdbc://{aws_key}:{aws_secret}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&driver_path={driver_path}
Much more info here:
Upvotes: 0
Reputation: 2413
In case someone else would come here:
awsathena+jdbc://username:[email protected]:port/dbname
This is from the superset documentation.
Upvotes: 7
Reputation: 759
You must define a property s3_staging_dir when you connect to Athena's Driver.
Example: s3_staging_dir=s3://your_bucket
Upvotes: 0
Reputation: 303
Take a look at this github PR You'll want to install PyAthenaJDBC package into pip. The driver that you are referring to is a Java driver, which is great, but Superset is largely a Python application, so it'll need a python driver to connect/interact with Athena.
The above answer is correct, but you'll want to install that package to ensure that you actually can connect to athena.
Upvotes: 1