Pankaj_Pandav
Pankaj_Pandav

Reputation: 79

Airflow Scheduler throws error for DAGs with schedule_interval as None

I have issue with airflow. There is a customer generator script which accepts input from yaml file and loads the DAGs. It works fine when all DAG yaml files have schedule interval as Non "None". There are many DAGs which has schedule_interval as None and few of them have @once.

YAML file example is -

cluster:
  nodes: 10
  subnet: "subnet-A"
  instance: "m4.2xlarge"
  configbucket: "bucketabc"
  jar: "s3://xxxxx.jar"
  conf: "app.conf"

schedule:
  state: "unpause"
  concurrency: 10
  startdate: "2050-08-05 00:00"
  cron: "None"

The generator script has below -

            if "schedule" in project_settings:
                schedule_settings = project_settings["schedule"]
                concurrency = schedule_settings["concurrency"]
                cron =  schedule_settings["cron"]
                startdate =  datetime.strptime(schedule_settings["startdate"], "%Y-%m-%d %H:%M")

            #print "my projectname is: " + project

            dag = DAG(
                dag_id = project,
                default_args=args,
                user_defined_macros=user_macros,
                schedule_interval=cron,
                concurrency=concurrency,
                start_date=startdate
            )

The error which I get when there are many DAGs with schedule_interval=None

INFO - [2020-04-08 12:30:45,529] {dagbag.py:302} ERROR - Failed to bag_dag: /home/deploy/airflow/dags/genertor.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models/dagbag.py", line 296, in process_file
    croniter(dag._schedule_interval)
  File "/usr/local/lib/python3.6/site-packages/croniter/croniter.py", line 91, in __init__
    self.expanded, self.nth_weekday_of_month = self.expand(expr_format)
  File "/usr/local/lib/python3.6/site-packages/croniter/croniter.py", line 468, in expand
    raise CroniterBadCronError(cls.bad_length)
croniter.croniter.CroniterBadCronError: Exactly 5 or 6 columns has to be specified for iteratorexpression.

Did anyone face this issue?

Upvotes: 0

Views: 2537

Answers (1)

UJIN
UJIN

Reputation: 1758

Airflow DAG schedule_interval can either be a cron espression as a string or it can be None (NB not the string "None").

In your settings you have:

cron: "None"

that is a string in Python. If you cannot change that YAML file to:

cron: None

you can still check for that string in the DAG itself:

schedule_interval = None if cron == "None" else cron

Upvotes: 2

Related Questions