Reputation: 977
I have a Scala - Spark Program is shown below
Here the Scala Objects Season, product, Vendor.....Groups are run in serial order, FIFO, is there a way to make it parallel? Submit all jobs at once?
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object Main extends App {
var appName: String = "DataExtract"
var master: String = "local[*]"
val sparkConf: SparkConf = new SparkConf()
.setAppName(appName)
.setMaster(master)
val spark: SparkSession = SparkSession
.builder()
.config(sparkConf)
.getOrCreate()
Season.run(spark)
Product.run(spark)
Vendor.run(spark)
User.run(spark)
..
..
.
Group.run(spark)
}
Upvotes: 2
Views: 166
Reputation: 22605
To make spark jobs run asynchronously you would just need to wrap them in Future
:
import scala.concurrent.{Await, Future}
import scala.concurrent.ExecutionContext.Implicits._
import scala.concurrent.duration._
val jobs = Future.sequence(
List(
Future(Season.run(spark)),
Future(Product.run(spark)),
Future(Vendor.run(spark)),
Future(User.run(spark))
)
)
Await.result(jobs, 1 hour); //we need to block main thread
//to allow jobs in other threads to finish
//You can replace finite duration with Duration.Inf
Additionally you'd have to set spark job scheduler to FAIR:
sparkConf.set("spark.scheduler.mode", "FAIR")
Upvotes: 3