Raghav
Raghav

Reputation: 2238

Timing a spark process and killing it if its too slow

I am building a process for launching user built queries (business rules) using Scala-Spark/SQL. One of the requirements is that if the SQLs perform slower than expected (every rule has an expected performance (time in seconds) attribute), I need to flag them as such for future references, as well as kill the long running (slow) process/job,

So far I am thinking of the following approach -

  1. Start timing
  2. launch job in a scala future - thread
  3. wait for time for the job
  4. if the thread hasnt completed in the time expected kill the job and report it as a slow process

I am concerned that I am fiddling with the distributed nature of the job. Another concern is that, for my "job" (that of running that query), spark internally will launch an unknown number of tasks across nodes, how will the timing process work, what kind of actual performance shall be reported back to my program!!

Suggestions please..

Upvotes: 2

Views: 751

Answers (1)

elcomendante
elcomendante

Reputation: 1161

I suggest different approach: build streaming / scheduled batch application that updates state to DB upon new input data arrival, then provide rest api to access that state according to query range required by client. From my experience allowing clients to launch series of spark jobs will expose you to huge operational overhead while managing their performance and volume -> effect of cluster resources. It is easier to tune and monitor and productionise your queries re: partitions, cores / executor nos - optimal cluster resources and manage query rest api. In case, this is not suitable for you: build rest api by allowing user to launch own spark - job per each query, examples: spark hidden api 1, spark hidden api 1 , spark job server, and then build app that monitors spark ui and kills -> relaunches if it is too long, you can use that script as example kill spark job via spark ui. Your approach planned might very hard to execute as spark launches multiple future jobs itself, from which many are lazy implemented, and timing each stage of execution it is pretty hard, perhaps you can use future to launch spark job per client query and monitor its length? I hope it helps

Upvotes: 1

Related Questions