yuxh
yuxh

Reputation: 954

how to submit to spark for many jobs in one application

I have a report stats project which use spark 2.1(scala),here is how it works:

object PtStatsDayApp extends App {
    Stats A...
    Stats B...
    Stats C...
     .....     
}

someone put many stat computation(mostly not related) in one class and submit it using shell. I find it has two problems:

Any other idea or best practice? thanks

Upvotes: 0

Views: 653

Answers (2)

uh_big_mike_boi
uh_big_mike_boi

Reputation: 3470

First you can set the scheduler mode to FAIR. Then you can use parallel collections to launch simultaneous Spark jobs on a multithreaded driver. A parallel collection, lets say... a Parallel Sequence ParSeq of ten of your Stats queries, can use a foreach to fire off each of the Stats queries one by one. It will depend on how many cores the driver has as to how many threads you can use aimultaneously. By default, the global execution context has that many threads.

Check out these posts they are examples of launching concurrent spark jobs with parallel collections.

Cache and Query a Dataset In Parallel Using Spark

Launching Apache Spark SQL jobs from multi-threaded driver

Upvotes: 1

Serge Harnyk
Serge Harnyk

Reputation: 1339

There are several 3d party free Spark schedulers like Airflow, but I suggest to use Spark Launcher API and write a launching logic programmatically. With this API you can run your jobs in paralel, sequentially or whatever you want.

Link to doc: https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/launcher/package-summary.html

Efficiency of running your jobs in parallel mostly depends on your Spark Cluster configuration. In general Spark supports such kind of workloads.

Upvotes: 1

Related Questions