AppleCEO
AppleCEO

Reputation: 73

Does Spark TPCDS supports on YARN?

I am testing Spark-3.3.0-without-Hadoop using TPCDS referring spark-tpcds-datagen ,This spark is running on my Hadoop-3.2

Data is produced and -put to hdfs://xxx/tpcds/data330

When I run :

./SPARK/bin/spark-submit  \ 
--master yarn \             # not working
--deploy-mode client  \     # not working
--queue tpcdsqueue \        # not working
--class org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark  \
~/tpcds/spark-sql_2.12-3.3.0-tests.jar \
--data-location hdfs://xxx/tpcds/data330 --query-filter "q1"

It runs well and returns expected time-costing results:

Stopped after 2 iterations, 2691 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_211-b12 on Linux 3.10.0-862.el7.x86_64
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
TPCDS Snappy:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q1                                                 1167           1346         253          0.0      Infinity       1.0X

but seems not on YARN, which means the following 3 settings doesn't work

--master yarn \
--deploy-mode client  \
--queue tpcdsqueue \ 

Upvotes: 0

Views: 87

Answers (1)

AppleCEO
AppleCEO

Reputation: 73

Find out that src code has determined it as local mode.

so change getSparkSession in org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark as below will work:

  override def getSparkSession: SparkSession = {
    val conf = new SparkConf()
//      .setMaster(System.getProperty("spark.sql.test.master", "local[1]"))
      .setAppName("test-sql-context")
//      .set("spark.sql.parquet.compression.codec", "snappy")
//      .set("spark.sql.shuffle.partitions", System.getProperty("spark.sql.shuffle.partitions", "4"))
//      .set("spark.driver.memory", "3g")
//      .set("spark.executor.memory", "3g")
//      .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)
//      .set("spark.sql.crossJoin.enabled", "true")

    SparkSession.builder.config(conf).getOrCreate()
  }

Upvotes: 0

Related Questions