Reputation: 329
As I have learned about Hadoop Map-Reduce jobs that mapper output is written to local storage and not to HDFS, as it is ultimately a throwaway data and so no point of storing in HDFS.
But as I see in case of Sqoop mapper output file part-m-00000
is written into HDFS. So my doubt is whether there is some setting in Hadoop to control where mapper output gets written to? And it is set to local storage by default?
Upvotes: 1
Views: 1895
Reputation: 1810
If there are no reducers then mapper output is written to HDFS. Even in this case mapper output is not directly written to HDFS but written on individual node disk and then copied over to HDFS.
Sqoop is one scenario where it is typically a map only job wherein you want o get data from a table in parallel but you do not need to reduce data on any condition.
Check this link : Identity Reducer vs zero reducer
Upvotes: 2