user2017
user2017

Reputation: 464

Do mappers store it's intermediate outputs on datanode's RAM on which it is running?

Is my understanding correct that job tracker launches task(mapper/reducer) on datanode where inputsplit is stored and runs that task on that piece of data and mapper stores it's intermediate output in its local storage ?

so my question is: as mapper is running on datanode so it stores it's intermediate data on datanode's RAM? And as datanode disk is the part of an hdfs and intermediate output is not stored on hdfs..

Upvotes: 5

Views: 425

Answers (2)

Sumeet Gupta
Sumeet Gupta

Reputation: 198

The Map tasks initially store its output in the buffer of the datanode.

Once the buffer is filled up to 80% of its capacity, it starts to write on the disk of the datanode itself (not HDFS). This disk location can be viewed/modified in the mapred-site.xml in Hadoop 2.0 under property name-

mapreduce.cluster.local.dir

Upvotes: 2

Kris
Kris

Reputation: 1724

The output of the Mapper (intermediate data) is stored on the Local file system (not HDFS) of each individual mapper data nodes. This is typically a temporary directory which can be setup in config by the Hadoop administrator. Once the Mapper job completed or the data transferred to the Reducer, these intermediate data is cleaned up and no more accessible.

Upvotes: 5

Related Questions