Reputation: 2362
I want to export an environmental variable from a custom hook and being able to retrieve the value of that environmental variable the next time the hook is initialized.
Specifically, it is a custom SnowflakeHook
, in which I want to check whether or not a preset (in a docker-compose
file) env variable has a certain value, and if so, export another env variable after doing some things. I have created an extra method for that, with the following code:
env = os.environ['ENV']
user = os.environ['USER'].replace('.', '_')
if env == 'dev':
logging.info('Development environment detected')
dev_db_name = f'{env}_{user}'
try:
if os.environ['DEV_DATABASE_CREATED'] == 'True':
logging.info('Dev database already exists')
except KeyError:
self.run(f"""CREATE DATABASE {dev_db_name} CLONE {self.database}""")
os.environ['DEV_DATABASE_CREATED'] = 'True'
logging.info(f'Dev database {dev_db_name} created')
self.run(f"USE DATABASE {dev_db_name};")
self.run(sql, autocommit, parameters)
This code check if the env variable ENV
is 'dev'
and if so, attempts to create a new database and exports the env variable DEV_DATABASE_CREATED
. The problem here is that the exported variable doesn't persist. The database is cloned, the logging (f'Dev database {dev_db_name} created')
is shown, but the next time I execute SnowflakeHook, it gives me a KeyError, although then it says the Database that is trying to create already exists.
So, is there a way to make DEV_DATABASE_CREATED
persist?
Upvotes: 1
Views: 527
Reputation: 20097
You can store it as Airflow Variable (https://airflow.apache.org/docs/apache-airflow/stable/concepts/variables.html) - they will be persisted in the DB.
This will work however for any kind of variables that are not "dependent" on the data interval you work. Typically Airflow DAG runs can be run for a specified data interval (hour, day, week, etc.) and there are multiple DAG runs - one for each intervals. If you have a value that will be "the same" accross multiple data intervals of the same DAG Run you can use Airflow Variables to store such values.
If on the other hand the value depends on the data interval, the variable should be stored as XCom: https://airflow.apache.org/docs/apache-airflow/stable/concepts/xcoms.html . One of the tasks (usually first) should generate the value as xcom, and other tasks should read it from XCom. The advantage of this approach that it is "idempotent" - because there can be different value of Xcom for each interval, so you can re-run past data intervals without affecting other intervals, as each data interval has it's own "space" of values to use and operate on.
Upvotes: 2