Reputation: 476

how RAM is used in mapreduce processing?

Need clarification on processing, daemons like(namenode,datanode,jobttracker,task tracker) these all lie in a cluster (single node cluster- they are distributed in hard-disk). What is the use of RAM or cache in map reduce processing or how it is accessed by various process in map reduce ?

Upvotes: 1

Answers (3)

Ravindra babu

Reputation: 38950

RAM is used during processing of Map Reduce application.

Once the data is read through InputSplits (from HDFS blocks) into memory (RAM), the processing happens on data stored in RAM.

mapreduce.map.memory.mb = The amount of memory to request from the scheduler for each map task.

mapreduce.reduce.memory.mb = The amount of memory to request from the scheduler for each reduce task.

Default value for above two parameters is 1024 MB ( 1 GB )

Some more memory related parameters have been used in Map Reduce phase. Have a look at documentation page about mapreduce-site.xml for more details.

Related SE questions:

Mapreduce execution in a hadoop cluster

Upvotes: 1

BruceWayne

Reputation: 3374

You mentioned cluster in your question, we will not call single server or machine as cluster
Daemons(Processes) don't distributed across hard disks, those will utilize RAM to run
Regarding Cache look into this answer

Upvotes: 1

siddhartha jain

Reputation: 1006

Job Tracker and Task tracker were used to manage resources in cluster in map reduce 1.x and the reason it was removed is because it was not efficient method. Since map reduce 2.x a new mechanism was introduced called YARN. You can visit this link http://javacrunch.in/Yarn.jsp for understanding in depth working of YARN. Hadoop daemons use the ram for optimizing the job execution like in map reduce RAM is used for keeping resource logs in memory when a new job is submitted so that resources manager can identify how to distribute a job in a cluster. One more important thing is that hadoop map reduce performe disk oriented jobs it uses disk for executing a job and that is a major reason due to which it is slower than spark.

Hope this solve your query

Upvotes: 1

how RAM is used in mapreduce processing?

Answers (3)

Related Questions