Kavitha
Kavitha

Reputation: 205

Multi-threading in Hadoop/Spark

I have an idea about Multi-threading in general but not sure how it is used in Hadoop.

Based on my knowledge, Yarn is responsible for managing/controlling Spark/Mapreduce job resources, can't think of Multi-threading here. Not sure whether it can be used anywhere else in Hadoop Eco System.

I would appreciate if anybody could provide some information on this.

Many thanks,

Upvotes: 2

Views: 4484

Answers (2)

Prashil Sureja
Prashil Sureja

Reputation: 149

It is possible to run multithreaded code in Spark. Take an example of Java code in Spark

AnyCollections.parallelStream().forEach(temo -> {
// Add your spark code here. 
        }

Now based on the number of cores in the driver it will spawn multiple executors and do stuff in parallel.

Upvotes: 0

Chen Wei
Chen Wei

Reputation: 402

actually, YARN is responsible for managing the resource allocation and de-allocation for containers requested by Application Master(MR-AppMaster or Spark-Driver). So the RPC between them are all about negotiation of resource agreement and it does not consider any details how tasks are running inside MapReduce and Spark.

For MapReduce-Hadoop, each task(mapper or reducer) is a single process running on a JVM, it doesn't employ any multi-threaded here.

For Spark, each executor are actually composed of many worker threads. Here each Spark task is corresponding to each task(single process) in MapReduce. So Spark does implement based on multi-threads models for lower
overhead of JVM and data shuffling between tasks.

Based on my experiences, Multi-threads models lower the overhead but suffers from the huge cost of fault tolerance. If an executor in Spark fails, all the tasks running inside the executor have to re-run but only single task needs to re-run for MapReduce. Also Spark suffers from huge memory pressure because all the tasks in side a executor needs to cache data in terms of RDD. But Mapreduce task only process one block at a time.

Hope this is helpful.

Upvotes: 3

Related Questions