Surender Raja
Surender Raja

Reputation: 3599

In Mapreduce, does replication apply to intermediate data also?

In Mapreduce, we say that the output produced by mappers are called intermediate data.

Are intermediate data also replicated?

Are intermediate data temporary?

When will intermediate data get deleted? Is it deleted automatically or do we need to explicitly delete it?

Upvotes: 4

Views: 624

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29155

Mapper's spilled files are stored in the local file system of the worker node where the Mapper is running. Similarly the data streamed from one node to another node is stored in local file system of the worker node where the task is running.

This local file system path is specified by hadoop.tmp.dir property which by default is '/tmp'.

After the completion or failure of the job the temporary location used on the local file system get's cleared automatically you don't have to perform any clean up process, it's automatically handled by the framework.

Upvotes: 5

Related Questions