Reputation: 1309
I'm migrating from on premises airflow to amazon MWAA 2.0.2. Currently I'm using an s3 connection which contains the access key id and secret key for s3 operations:
{
"conn_id" = "s3_default"
"conn_type" : "S3"
"extra" = {
"aws_access_key_id" = "aws_access_key_id"
"aws_secret_access_key" = "aws_secret_access_key"
}
}
I create a hook from airflow.providers.amazon.aws.hooks.s3
which implementing airflow.providers.amazon.aws.hooks.base_aws
to interact with s3:
hook = S3Hook(conn_id=self.s3_default)
Everything is fine. But because I'm moving to MWAA I assume I don't need to handle s3 access using AWS keys. I understand that my tasks running inside ECS containers and the MWAA execution role should be enough (given the proper s3 IAM permissions are provided in the role) for boto3 to interact with s3 in this case. So What I did I removed the connection Id from the hook and deleted the airflow connection:
hook = S3Hook()
So according the AwsBaseHook
documentation if the aws_conn_id
is None or empty then the default boto3 behavior is used. However it is not working and I am getting access denied error:
Does anybody know if even this possible to connect to aws services from MWAA without an explicit connection and only permissions in the execution role?
Upvotes: 1
Views: 6595
Reputation: 149
The logs you see are misleading. It gives the same info (No credentials retrieved, role_arn is None...) when it is successful. If it really can't find credentials, you see this error.
botocore.exceptions.NoCredentialsError: Unable to locate credentials
It appears there is a different exception lower in your logs, can you post the full log? I believe the credentials are not the problem.
This DAG works in basic MWAA as long as you give your execution role permissions to access the bucket in question.
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python import PythonOperator
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
DEFAULT_ARGS = {
"owner": "airflow",
"depends_on_past": False,
"email": ["[email protected]"],
"email_on_failure": False,
"email_on_retry": False,
}
def check_bucket():
s3 = S3Hook()
print(s3.check_for_bucket(bucket_name="my-bucket-name"))
with DAG(
dag_id="test_s3_hook",
default_args=DEFAULT_ARGS,
start_date=days_ago(1),
schedule_interval="@once",
) as dag:
test_s3_hook = PythonOperator(
task_id="test_s3_hook",
python_callable=check_bucket,
)
Upvotes: 1
Reputation: 1218
If you have set the correct permissions in the execution role you can also create an Apache Airflow connection and specify your execution role and its ARN in your Apache Airflow connection object.
https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html#mwaa-create-role-how-adding
Upvotes: 1