Reputation: 3599
In Mapreduce, we say that the output produced by mappers are called intermediate data.
Are intermediate data also replicated?
Are intermediate data temporary?
When will intermediate data get deleted? Is it deleted automatically or do we need to explicitly delete it?
Upvotes: 4
Views: 624
Reputation: 29155
Mapper's spilled files are stored in the local file system of the worker node where the Mapper is running. Similarly the data streamed from one node to another node is stored in local file system of the worker node where the task is running.
This local file system path is specified by
hadoop.tmp.dir
property which by default is '/tmp'.
After the completion or failure of the job the temporary location used on the local file system get's cleared automatically you don't have to perform any clean up process, it's automatically handled by the framework.
Upvotes: 5