spark-submit using YARN master programmatically not working

Question

I am using Apache Spark 2.1.0. If I do:

$ spark-submit --master yarn main.py

The Spark python module will execute on YARN properly and the application will show up on YARN web browser GUI as a finished application.

If I do it programmatically, it doesn't show up in YARN GUI, so I am assuming it doesn't end up using YARN as master:

from pyspark import SparkContext, SparkConf
import os

from pyspark.sql import *
from pyspark.sql.types import *

def read_cluster_file(file_path, spark, table_name):
    cluster_data = spark.read.csv(file_path, header=True, mode="DROPMALFORMED")    
    cluster_data.createOrReplaceTempView(table_name)

    return cluster_data

def main():
    spark = SparkSession.builder.master("yarn").appName("gene_cluster").getOrCreate()
    dir = os.path.dirname(__file__)
    cluster_data = read_cluster_file("file:"+dir+"/gene_cluster.csv", ",", spark, "cluster")
    result_df = spark.sql("SELECT `subunits(Entrez IDs)` FROM cluster")
    result_df.show()

if __name__ == '__main__':
    main()

How do I make my Spark application run with YARN master programmatically in Python?

I have tried:

.setMaster("yarn-client") and .setMaster("yarn-cluster").
Using SQLContext and the new SparkSession.

spark-submit using YARN master programmatically not working

Answers (1)

Related Questions