Spark-java multithreading vs running individual spark jobs

Question

I am new with Spark and trying to understand performance difference in below approaches (Spark on hadoop)

Scenario : As per batch processing I have 50 hive queries to run.Some can run parallel and some sequential.

- First approach

All of queries can be stored in a hive table and I can write a Spark driver to read all queries at once and run all queries in parallel ( with HiveContext) using java multi-threading

pros: easy to maintain
Cons: all resources may get occupied and performance tuning can be tough for each query.

- Second approach

using oozie spark actions run each query individual

pros:optimization can be done at query level
cons: tough to maintain.

I couldn't find any document about the first approach that how Spark will process queries internally in first approach.From performance point of view which approach is better ?

The only thing on Spark multithreading I could found is: "within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads"

Thanks in advance

Spark-java multithreading vs running individual spark jobs

Answers (1)

Related Questions