Reputation: 73
I am testing Spark-3.3.0-without-Hadoop
using TPCDS referring spark-tpcds-datagen
,This spark is running on my Hadoop-3.2
Data is produced and -put
to hdfs://xxx/tpcds/data330
When I run :
./SPARK/bin/spark-submit \
--master yarn \ # not working
--deploy-mode client \ # not working
--queue tpcdsqueue \ # not working
--class org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark \
~/tpcds/spark-sql_2.12-3.3.0-tests.jar \
--data-location hdfs://xxx/tpcds/data330 --query-filter "q1"
It runs well and returns expected time-costing results:
Stopped after 2 iterations, 2691 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_211-b12 on Linux 3.10.0-862.el7.x86_64
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
q1 1167 1346 253 0.0 Infinity 1.0X
but seems not on YARN, which means the following 3 settings
doesn't work
--master yarn \
--deploy-mode client \
--queue tpcdsqueue \
Upvotes: 0
Views: 87
Reputation: 73
Find out that src code has determined it as local
so change getSparkSession
in org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark
as below will work:
override def getSparkSession: SparkSession = {
val conf = new SparkConf()
// .setMaster(System.getProperty("spark.sql.test.master", "local[1]"))
// .set("spark.sql.parquet.compression.codec", "snappy")
// .set("spark.sql.shuffle.partitions", System.getProperty("spark.sql.shuffle.partitions", "4"))
// .set("spark.driver.memory", "3g")
// .set("spark.executor.memory", "3g")
// .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)
// .set("spark.sql.crossJoin.enabled", "true")
Upvotes: 0