mumbai
mumbai

Reputation: 11

process small file map reduce hadoop

I have a 456kb file which is being read from hdfs and its given as input to mapper function. Every line contain a integer for which I am downloading some files and storing them on local system. I have hadoop set up on two-node cluster and the split size is changed from the program to open 8-mappers :

    Configuration configuration = new Configuration();

    configuration.setLong("mapred.max.split.size", 60000L);
    configuration.setLong("mapred.min.split.size", 60000L);

8 mappers are created but same data is downloaded on both the servers, I think its happening because block size is still set to default 256mb and input file is processed twice. So my question is can we process a small size file with map reduce?

Upvotes: 0

Views: 231

Answers (1)

SSaikia_JtheRocker
SSaikia_JtheRocker

Reputation: 5063

If your download of files take time, you might have suffered from what's called speculative execution of Hadoop, which is by default enabled. It's just a guess though, since, you said you are getting same files downloaded more than once.

With speculative execution turn on the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform.

You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively.

Upvotes: 1

Related Questions