Harsh
Harsh

Reputation: 265

How to use MultithreadedMapper class in Hadoop Mapreduce?

I came across MultithreadedMapper class in the new Hadoop version,and the documentation says that it can be used instead of the conventional (single-threaded) mapper class. But I didn't come across any demo example for using this new class. Also, I would be happier to use setNumberOfThreads() method. Any code example for using this?

Thanks in advance

Upvotes: 4

Views: 2471

Answers (1)

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

small code snippet for you:

Configuration conf = new Configuration();
Job job = new Job(conf);
job.setMapperClass(MultithreadedMapper.class);
conf.set("mapred.map.multithreadedrunner.class", WebGraphMapper.class.getCanonicalName());
conf.set("mapred.map.multithreadedrunner.threads", "8");
job.setJarByClass(WebGraphMapper.class);
// rest ommitted
job.waitForCompletion(true);

I think it is pretty self-explaining. You are using the multithreaded mapper as the main class and then configure which class (your real mapper) it has to run. There are also these convenience static methods which does this configuration stuff for you. A call could look like this:

MultithreadedMapper.setMapperClass(job, WebGraphMapper.class);
MultithreadedMapper.setNumberOfThreads(job, 8);

Upvotes: 8

Related Questions