Kishore
Kishore

Reputation: 5881

How to pass file as an argument in SparkSubmitOperator in Apache Airflow

Spark submit command

spark-submit --class=com.test.App  /home/ubuntu/app.jar /home/ubuntu/abc.properties

Airflow script to schedule spark job

from airflow import DAG
from datetime import datetime, timedelta
from airflow.contrib.operators import SparkSubmitOperator 
import sys
import os
from airflow.models import Variable
from airflow.operators.python_operator import PythonOperator

current_date = datetime.now() 

default_args = {
    'owner': 'airflow',
    'catchup' : False,
    'depends_on_past': False,
    'start_date': datetime(2019, 1, 4, 13, 22),
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG(
    'test', default_args=default_args, schedule_interval=timedelta(minutes=5))

spark_task1 = SparkSubmitOperator(
    task_id='LoadRawPOPToCassandra',
    application='/home/ubuntu/app.jar',
    java_class='com.test.App',
    application_args="/home/ubuntu/abc.properties",
    dag=dag)

spark_task1

It gives error airflow.exceptions.AirflowException. SparkSubmitOperator is taking each character of file name(application_args ) as a argument.

How I can pass file path as an argument in SparkSubmitOperator operator. I tried with files instead of application_args in SparkSubmitOperator parameter but same error. I am using spark as local

Upvotes: 0

Views: 5833

Answers (1)

Alessandro Cosentino
Alessandro Cosentino

Reputation: 2364

As per documentation, the application_args argument of the SparkSubmitOperator takes a list, not a string, so what you want to pass is the following:

application_args=["/home/ubuntu/abc.properties"],

Upvotes: 2

Related Questions