Reputation: 476
Need clarification on processing, daemons like(namenode,datanode,jobttracker,task tracker) these all lie in a cluster (single node cluster- they are distributed in hard-disk). What is the use of RAM or cache in map reduce processing or how it is accessed by various process in map reduce ?
Upvotes: 1
Views: 1794
Reputation: 38950
RAM is used during processing of Map Reduce application.
Once the data is read through InputSplits (from HDFS blocks) into memory (RAM), the processing happens on data stored in RAM.
mapreduce.map.memory.mb = The amount of memory to request from the scheduler for each map task.
mapreduce.reduce.memory.mb = The amount of memory to request from the scheduler for each reduce task.
Default value for above two parameters is 1024 MB ( 1 GB )
Some more memory related parameters have been used in Map Reduce phase. Have a look at documentation page about mapreduce-site.xml for more details.
Related SE questions:
Mapreduce execution in a hadoop cluster
Upvotes: 1
Reputation: 3374
Upvotes: 1
Reputation: 1006
Job Tracker and Task tracker were used to manage resources in cluster in map reduce 1.x and the reason it was removed is because it was not efficient method. Since map reduce 2.x a new mechanism was introduced called YARN. You can visit this link http://javacrunch.in/Yarn.jsp for understanding in depth working of YARN. Hadoop daemons use the ram for optimizing the job execution like in map reduce RAM is used for keeping resource logs in memory when a new job is submitted so that resources manager can identify how to distribute a job in a cluster. One more important thing is that hadoop map reduce performe disk oriented jobs it uses disk for executing a job and that is a major reason due to which it is slower than spark.
Hope this solve your query
Upvotes: 1