Reputation: 3739
have a snowflake query in airflow using a password in a jinja template:
create stage if not exists {{ params.dest_database }}.{{ params.stg_schema }}.{{ params.stg_prefix }}blabla_ext_stage
url='{{ params.s3_bucket }}'
credentials=(aws_key_id='{{ params.login }}' aws_secret_key='{{ params.password }}');
problem is the password shows in query in log - any way of hiding it?
Upvotes: 1
Views: 697
Reputation: 1855
Provider's operators works with hooks that automatically hide sensitive parts of the credentials in the logs. For example, you can check S3ToRedshiftOperator
that uses the S3 hook to get the credentials. I don't know how you're using that template, but I highly recommend using the same pattern with hooks to prevent showing the secret key in the logs.
This is what it's showing for me:
[2021-08-17 07:40:33,584] {{base.py:78}} INFO - Using connection to: id: redshift. Host: my-cluster-readable.us-east-1.redshift.amazonaws.com, Port: 5439, Schema: schema, Login: my_login_readable, Password: ***, extra: {}
[2021-08-17 07:40:33,601] {{dbapi.py:204}} INFO - Running statement:
COPY schema.table1
FROM 's3://s3-bucket-xxx/folder1/folder2/'
with credentials
'aws_access_key_id=MY_ACCESS_KEY_THAT_IS_READABLE;aws_secret_access_key=***'
IGNOREHEADER 1
DELIMITER ','
FORMAT CSV
EMPTYASNULL
BLANKSASNULL
ROUNDEC
TRUNCATECOLUMNS
TRIMBLANKS
GZIP;
, parameters: None
[2021-08-17 07:40:43,105] {{redshift_operators.py:122}} INFO - COPY command complete...
As you can see there, both Password
for the connection and the aws_secret_access_key
are shown as ***
(automatically by Airflow using the hook).
My recommendation would be to use exactly the same logic like this:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from airflow.providers.amazon.aws.utils.redshift import build_credentials_block
# More code here
s3_hook = S3Hook(aws_conn_id="your_conn_id")
credentials = s3_hook.get_credentials()
credentials_block = build_credentials_block(credentials)
# Invoke the template here using the credentials_block as a param
Upvotes: 1
Reputation: 6229
Why do you want to create a stage on every run of the airflow job? Create it in Snowflake first and then use it in the dag run.
If you need to create the stage inside the job then you can use Snowflake's Storage Integrations for this.
Upvotes: 1