Reputation: 337
I'm unable to understand the internal mechanism of allocation of resources to Map Reduce and Spark jobs.
In the same cluster we can run Map Reduce and Spark jobs, however for running map reduce jobs internal resource manager will allocate available resources like data node and task trackers to the job. Internally job my required 'N' number of mappers and reducers.
When it comes to Spark context it needs worker nodes and executors(Internally JVM) to compute the program.
Is that mean there will be different nodes for Map Reduce and Spark jobs? If not how will the differentiation will happen between Task tracker and Executors. How will cluster manager identifies the specific node for Hadoop and Spark job.
Can someone enlighten me here.
Upvotes: 1
Views: 798
Reputation: 1057
Task trackers or executors - all are daemons.
When an MR job is submitted, job-tracker service or resource-manger service allocates proper node-manager with required resource.
And when a spark job is submitted, the Application master acquires worker nodes where resource is available near data, and submits/deploys tasks on that node through the executor service.
It's just the different services/daemons of the underlying framework - whether MR or spark, that manages the whole job scheduling and starts JVM with appropriate resource in appropriate node.
Upvotes: 1
Reputation: 155
In my opinion,when running a spark program,it's spilted into several spark jobs and each job is spilted into several tasks.The task has kinds of types including map-reduce.Map-reduce is only a concrete computation process.
Upvotes: 0