dolphinZhang
dolphinZhang

Reputation: 83

Where does the middle data produced in each stage in Hadoop MapReduce get stored?

I have learning hadoop mapreduce for a while, and as you know, hadoop uses hdfs to store data files on hard disks, when we run mapreduce, progran gets data from hdfs, but in each stage of mapreduce, where does data get stored? I got some answers

  1. hsfs
  2. local hard disk where mapreduce runs on

Upvotes: 2

Views: 402

Answers (1)

Sandeep Singh
Sandeep Singh

Reputation: 7990

Generally intermediate data files generated by map and reduce tasks are stored in a directory (location) on the local disk where MapReduce runs on. The directory contains:

  • Output files generated by the map tasks that serve as input for the reduce tasks.
  • Temporary files generated by the reduce tasks.

The temporary data locations are controlled by mapreduce.cluster.local.dir property. You can configure one or more locations for intermediate data that is generated by the map and reduce tasks.

In some cases where ExecutorNode has not enough space to store the intermediate data, it can get stored on another disk as well where sufficient space is available.

This link can be useful to know more about it.

Upvotes: 2

Related Questions