Alexey Romanyuk
Alexey Romanyuk

Reputation: 11

Airflow executes my python dag-script once a second (one by one) ignoring min_file_process_interval (=30 sec)

Hi all airflow specialists

I have only one DAG-script which is using for my platform purposes.

My airflow only on dev (on qa and prod all is ok) has strange behaviour: when scheduler starts to execute my dag at this moment scheduler reads and executes my python scripts each seconds instead of 30 seconds.

After execution of DAG (and couple minutes after) scheduler returns to usual behaviour - read my script each 30 seconds (as min_file_process_interval tells).

This happens only in one installation (DEV), in other all is ok (airflow reads and executes the same my script once per 30 seconds).

I create some crutch which is check how much time has passed from previous script reading:

import pendulum
START_DATE, CURRECT_DATE = (pendulum.now('UTC'), ) * 2
    from airflow.configuration import conf
    min_file_process_interval=conf.get("core", "min_file_process_interval")
    ths_scrpt_last_run_at = Variable.get('CMOCEAN_BATCHFLOW_ORCHESTRATOR_LAST_RUN_AT', default_var=None)
    if ths_scrpt_last_run_at:
        ths_scrpt_last_run_at = parser.parse(ths_scrpt_last_run_at)
    else:
        #its for script 1st time start, when you havnt my variable
        some_var=Variable.set('CMOCEAN_BATCHFLOW_ORCHESTRATOR_LAST_RUN_AT',str(START_DATE))
        ths_scrpt_last_run_at=CURRECT_DATE - 2* timedelta(seconds=min_file_process_interval)

    if ths_scrpt_last_run_at > CURRECT_DATE - timedelta(seconds=min_file_process_interval):
        #Started earlier than needed [min_file_process_interval], its wrong behaviour
        print('Period '+str(min_file_process_interval)+' seconds has not finished yet. Stop the script execution')
        sys.exit("Period "+str(min_file_process_interval)+" sec is not finished, stop the script execution")
    else:
        #This is fine, got needed timeout.Just remember script start time to variable (for next executions)
        some_var=Variable.delete('CMOCEAN_BATCHFLOW_ORCHESTRATOR_LAST_RUN_AT')
        some_var=Variable.set('CMOCEAN_BATCHFLOW_ORCHESTRATOR_LAST_RUN_AT',str(START_DATE))
    

So, when problem moment starts I see in my log messages "Period 30 seconds has not finished yet. Stop the script execution":

[2022-11-16 08:44:45,076] {processor.py:153} INFO - Started process (PID=52935) to work on /opt/bitnami/airflow/dags/some_dir/orchestrator.py
[2022-11-16 08:44:45,077] {processor.py:641} INFO - Processing file /opt/bitnami/airflow/dags/some_dir/orchestrator.py for tasks to queue
[2022-11-16 08:44:45,077] {logging_mixin.py:115} INFO - [2022-11-16 08:44:45,077] {dagbag.py:507} INFO - Filling up the DagBag from /opt/bitnami/airflow/dags/some_dir/orchestrator.py
[2022-11-16 08:44:45,092] {logging_mixin.py:115} INFO - ========================================== NEW START ==========================================
[2022-11-16 08:44:45,092] {logging_mixin.py:115} INFO - Scheduler works
[2022-11-16 08:44:45,120] {logging_mixin.py:115} INFO - Period 30 seconds has not finished yet. Stop the script execution
[2022-11-16 08:44:46,130] {processor.py:153} INFO - Started process (PID=52936) to work on /opt/bitnami/airflow/dags/some_dir/orchestrator.py
[2022-11-16 08:44:46,131] {processor.py:641} INFO - Processing file /opt/bitnami/airflow/dags/some_dir/orchestrator.py for tasks to queue
[2022-11-16 08:44:46,132] {logging_mixin.py:115} INFO - [2022-11-16 08:44:46,132] {dagbag.py:507} INFO - Filling up the DagBag from /opt/bitnami/airflow/dags/some_dir/orchestrator.py
[2022-11-16 08:44:46,147] {logging_mixin.py:115} INFO - ========================================== NEW START ==========================================
[2022-11-16 08:44:46,147] {logging_mixin.py:115} INFO - Scheduler works
[2022-11-16 08:44:46,170] {logging_mixin.py:115} INFO - Period 30 seconds has not finished yet. Stop the script execution

As you can see its not some cicle in my script, it is scheduler repeats the reading of my script (because PIDs is different, as usual in 30 seconds mode. In example is PID=52935, PID=52936)

As you can see dagbag_import_timeout parameter is got from config successfully (I use it in my script and put it to log via print), but scheduler is just ignoring it at the issue moment.

Airflow version is 2.3.2

My config:

[core]
dags_folder=/opt/bitnami/airflow/dags
hostname_callable=socket.getfqdn
default_timezone=utc
executor=CeleryExecutor
parallelism=32
max_active_tasks_per_dag=16
dags_are_paused_at_creation=True
max_active_runs_per_dag=16
load_examples=False
plugins_folder=/opt/bitnami/airflow/plugins
execute_tasks_new_python_interpreter=False
fernet_key=FlXNrJzmw-2VrOBAd8dqFBNJX4DH1SZTdPq9FFMZoQo=
donot_pickle=True
dagbag_import_timeout=30.0
dagbag_import_error_tracebacks=True
dagbag_import_error_traceback_depth=2
dag_file_processor_timeout=50
task_runner=StandardTaskRunner
default_impersonation=
security=
unit_test_mode=False
enable_xcom_pickling=False
killed_task_cleanup_time=60
dag_run_conf_overrides_params=True
dag_discovery_safe_mode=True
dag_ignore_file_syntax=regexp
default_task_retries=0
default_task_weight_rule=downstream
default_task_execution_timeout=
min_serialized_dag_update_interval=30
compress_serialized_dags=False
min_serialized_dag_fetch_interval=10
max_num_rendered_ti_fields_per_task=30
check_slas=True
xcom_backend=airflow.models.xcom.BaseXCom
lazy_load_plugins=True
lazy_discover_providers=True
hide_sensitive_var_conn_fields=True
sensitive_var_conn_names=
default_pool_task_slot_count=128
max_map_length=1024

[database]
sql_alchemy_conn=postgresql+psycopg2://airflow:airflow@airflow-dev-postgresql:5432/airflow
sql_engine_encoding=utf-8
sql_alchemy_pool_enabled=True
sql_alchemy_pool_size=5
sql_alchemy_max_overflow=10
sql_alchemy_pool_recycle=1800
sql_alchemy_pool_pre_ping=True
sql_alchemy_schema=
load_default_connections=True
max_db_retries=3

[logging]
base_log_folder=/opt/bitnami/airflow/logs
remote_logging=False
remote_log_conn_id=
google_key_path=
remote_base_log_folder=
encrypt_s3_logs=False
logging_level=INFO
celery_logging_level=
fab_logging_level=WARNING
logging_config_class=
colored_console_log=True
colored_log_format=[%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s
colored_formatter_class=airflow.utils.log.colored_log.CustomTTYColoredFormatter
log_format=[%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format=%%(asctime)s %%(levelname)s - %%(message)s
task_log_prefix_template=
log_filename_template=dag_id={{ ti.dag_id }}/run_id={{ ti.run_id }}/task_id={{ ti.task_id }}/{%% if ti.map_index >= 0 %%}map_index={{ ti.map_index }}/{%% endif %%}attempt={{ try_number }}.log
log_processor_filename_template={{ filename }}.log
dag_processor_manager_log_location=/opt/bitnami/airflow/logs/dag_processor_manager/dag_processor_manager.log
task_log_reader=task
extra_logger_names=
worker_log_server_port=8793

[metrics]
statsd_on=False
statsd_host=localhost
statsd_port=8125
statsd_prefix=airflow
statsd_allow_list=
stat_name_handler=
statsd_datadog_enabled=False
statsd_datadog_tags=

[secrets]
backend=
backend_kwargs=

[cli]
api_client=airflow.api.client.local_client
endpoint_url=http://localhost:8080

[debug]
fail_fast=False

[api]
enable_experimental_api=False
auth_backends=airflow.api.auth.backend.session
maximum_page_limit=100
fallback_page_limit=100
google_oauth2_audience=
google_key_path=
access_control_allow_headers=
access_control_allow_methods=
access_control_allow_origins=

[lineage]
backend=

[atlas]
sasl_enabled=False
host=
port=21000
username=
password=

[operators]
default_owner=airflow
default_cpus=1
default_ram=512
default_disk=512
default_gpus=0
default_queue=default
allow_illegal_arguments=False

[hive]
default_hive_mapred_queue=

[webserver]
base_url=http://localhost:8080
default_ui_timezone=UTC
web_server_host=0.0.0.0
web_server_port=8080
web_server_ssl_cert=
web_server_ssl_key=
session_backend=database
web_server_master_timeout=120
web_server_worker_timeout=120
worker_refresh_batch_size=1
worker_refresh_interval=6000
reload_on_plugin_change=False
secret_key=a1pjQkdXZTRtYjFDOENlRklTYld6SVl2NjlMUVJORXY=
workers=4
worker_class=sync
access_logfile=-
error_logfile=-
access_logformat=
expose_config=False
expose_hostname=True
expose_stacktrace=True
dag_default_view=grid
dag_orientation=LR
log_fetch_timeout_sec=5
log_fetch_delay_sec=2
log_auto_tailing_offset=30
log_animation_speed=1000
hide_paused_dags_by_default=False
page_size=100
navbar_color=
default_dag_run_display_number=25
enable_proxy_fix=False
proxy_fix_x_for=1
proxy_fix_x_proto=1
proxy_fix_x_host=1
proxy_fix_x_port=1
proxy_fix_x_prefix=1
cookie_secure=False
cookie_samesite=Lax
default_wrap=False
x_frame_enabled=True
show_recent_stats_for_completed_runs=True
update_fab_perms=True
session_lifetime_minutes=43200
instance_name_has_markup=False
auto_refresh_interval=3
warn_deployment_exposure=True
audit_view_excluded_events=gantt,landing_times,tries,duration,calendar,graph,grid,tree,tree_data

[email]
email_backend=airflow.utils.email.send_email_smtp
email_conn_id=smtp_default
default_email_on_retry=True
default_email_on_failure=True

[smtp]
smtp_host=localhost
smtp_starttls=True
smtp_ssl=False
smtp_port=25
[email protected]
smtp_timeout=30
smtp_retry_limit=5

[sentry]
sentry_on=false
sentry_dsn=

[local_kubernetes_executor]
kubernetes_queue=kubernetes

[celery_kubernetes_executor]
kubernetes_queue=kubernetes

[celery]
celery_app_name=airflow.executors.celery_executor
worker_concurrency=16
worker_prefetch_multiplier=1
worker_enable_remote_control=true
worker_umask=0o077
broker_url=redis://:otUjs01rLS@airflow-dev-redis-master:6379/1
result_backend=db+postgresql://airflow:airflow@airflow-dev-postgresql:5432/airflow
flower_host=0.0.0.0
flower_url_prefix=
flower_port=5555
flower_basic_auth=
sync_parallelism=0
celery_config_options=airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
ssl_active=False
ssl_key=
ssl_cert=
ssl_cacert=
pool=prefork
operation_timeout=1.0
task_track_started=True
task_adoption_timeout=600
stalled_task_timeout=0
task_publish_max_retries=3
worker_precheck=False

[celery_broker_transport_options]

[dask]
cluster_address=127.0.0.1:8786
tls_ca=
tls_cert=
tls_key=

[scheduler]
job_heartbeat_sec=5
scheduler_heartbeat_sec=5
num_runs=-1
scheduler_idle_sleep_time=1
min_file_process_interval=30
deactivate_stale_dags_interval=60
dag_dir_list_interval=300
print_stats_interval=30
pool_metrics_interval=5.0
scheduler_health_check_threshold=30
orphaned_tasks_check_interval=300.0
child_process_log_directory=/opt/bitnami/airflow/logs/scheduler
scheduler_zombie_task_threshold=300
zombie_detection_interval=10.0
catchup_by_default=True
ignore_first_depends_on_past_by_default=True
max_tis_per_query=512
use_row_level_locking=True
max_dagruns_to_create_per_loop=10
max_dagruns_per_loop_to_schedule=20
schedule_after_task_execution=True
parsing_processes=2
file_parsing_sort_mode=modified_time
standalone_dag_processor=False
max_callbacks_per_loop=20
use_job_schedule=True
allow_trigger_in_future=False
dependency_detector=airflow.serialization.serialized_objects.DependencyDetector
trigger_timeout_check_interval=15

[triggerer]
default_capacity=1000

[kerberos]
ccache=/tmp/airflow_krb5_ccache
principal=airflow
reinit_frequency=3600
kinit_path=kinit
keytab=airflow.keytab
forwardable=True
include_ip=True

[github_enterprise]
api_rev=v3

[elasticsearch]
host=
log_id_template={dag_id}-{task_id}-{run_id}-{map_index}-{try_number}
end_of_log_mark=end_of_log
frontend=
write_stdout=False
json_format=False
json_fields=asctime, filename, lineno, levelname, message
host_field=host
offset_field=offset

[elasticsearch_configs]
use_ssl=False
verify_certs=True

[kubernetes]
pod_template_file=
worker_container_repository=
worker_container_tag=
namespace=default
delete_worker_pods=True
delete_worker_pods_on_failure=False
worker_pods_creation_batch_size=1
multi_namespace_mode=False
in_cluster=True
kube_client_request_args=
delete_option_kwargs=
enable_tcp_keepalive=True
tcp_keep_idle=120
tcp_keep_intvl=30
tcp_keep_cnt=6
verify_ssl=True
worker_pods_pending_timeout=300
worker_pods_pending_timeout_check_interval=120
worker_pods_queued_check_interval=60
worker_pods_pending_timeout_batch_size=100

[sensors]
default_timeout=604800

[smart_sensor]
use_smart_sensor=False
shard_code_upper_limit=10000
shards=5
sensors_enabled=NamedHivePartitionSensor

What the strange behaviour here?

I tied google its problem but I cant find some people with same strange airflow behaviour.

Upvotes: 1

Views: 539

Answers (1)

Hussein Awala
Hussein Awala

Reputation: 5096

dagbag_import_timeout is how long the DagFileProcessor can try to import a dag python script before timing out. So in your case, DagFileProcessor fails when it exceeds the 30s trying to import the dag. For each task operation (schedule, queued, run, ...), Airflow try to parse the dag script which contains the task, so if you have multiple parallel tasks, and your dag script contains a code which takes much time to be executed (reading from external service or a loop,...), you may have this problem in your log.

I suggest increasing dagbag_import_timeout to 2-5 minutes, and increasing the time between 2 DagFileProcessorProcess by setting the conf min_file_process_interval.

Upvotes: 1

Related Questions