Karthick
Karthick

Reputation: 2882

Does hadoop mapreduce open temporary files in hdfs

When a map-reduce job runs, it must be creating a lot of temporary files for storing results of various mappers and reducers. Are those temporary files written to hdfs.

If yes, the namenode's editlog could become huge in a short time given that it records each and every transaction like file open, close etc. Can that be avoided by directly writing to the native filesystem instead of hdfs or is that a bad idea?

Upvotes: 0

Views: 1144

Answers (2)

user3484461
user3484461

Reputation: 1133

Intermediate result of map reduce code has been written to local file system not hdfs and automatically it gets removed after completion of job.

I mean to say output from mapper has been written to local file system, specific location can be configured but by default it writes into /tmp/hadoop-username* location

Upvotes: 1

Mohammed Niaz
Mohammed Niaz

Reputation: 396

You mean to say, the temporary files are created each time when the Mapper runs. If yes, then you can't avoid this because Mapper's output are written to disk rather than in-memory. The TaskTracker would take care of creating setup for MR job and creating temporary disk space for Mapper intermediate output. Also the temporary space would be cleaned by TaskTracker once MR job completed.

This is one of bottleneck of MR programming paradigm.

Any comments/feedback would be appreciated.

Upvotes: 0

Related Questions