Reputation: 5881
Spark submit command
spark-submit --class=com.test.App /home/ubuntu/app.jar /home/ubuntu/abc.properties
Airflow script to schedule spark job
from airflow import DAG
from datetime import datetime, timedelta
from airflow.contrib.operators import SparkSubmitOperator
import sys
import os
from airflow.models import Variable
from airflow.operators.python_operator import PythonOperator
current_date = datetime.now()
default_args = {
'owner': 'airflow',
'catchup' : False,
'depends_on_past': False,
'start_date': datetime(2019, 1, 4, 13, 22),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=1),
}
dag = DAG(
'test', default_args=default_args, schedule_interval=timedelta(minutes=5))
spark_task1 = SparkSubmitOperator(
task_id='LoadRawPOPToCassandra',
application='/home/ubuntu/app.jar',
java_class='com.test.App',
application_args="/home/ubuntu/abc.properties",
dag=dag)
spark_task1
It gives error airflow.exceptions.AirflowException. SparkSubmitOperator is taking each character of file name(application_args ) as a argument.
How I can pass file path as an argument in SparkSubmitOperator operator. I tried with files instead of application_args in SparkSubmitOperator parameter but same error. I am using spark as local
Upvotes: 0
Views: 5833
Reputation: 2364
As per documentation, the application_args
argument of the SparkSubmitOperator takes a list, not a string, so what you want to pass is the following:
application_args=["/home/ubuntu/abc.properties"],
Upvotes: 2