Reputation: 83
I have learning hadoop mapreduce for a while, and as you know, hadoop uses hdfs to store data files on hard disks, when we run mapreduce, progran gets data from hdfs, but in each stage of mapreduce, where does data get stored? I got some answers
Upvotes: 2
Views: 402
Reputation: 7990
Generally intermediate data files generated by map and reduce tasks are stored in a directory (location) on the local disk where MapReduce runs on. The directory contains:
The temporary data locations are controlled by mapreduce.cluster.local.dir
property. You can configure one or more locations for intermediate data that is generated by the map and reduce tasks.
In some cases where ExecutorNode has not enough space to store the intermediate data, it can get stored on another disk as well where sufficient space is available.
This link can be useful to know more about it.
Upvotes: 2