Srini Subramanian
Srini Subramanian

Reputation: 161

hadoop filtering map output

I have a use case where in certain keys that map phase generates need to be filtered out before the reduce kicks in. Is something like this doable? Please let me know.

Upvotes: 1

Views: 906

Answers (2)

highlycaffeinated
highlycaffeinated

Reputation: 19867

A couple of options that come to mind:

  • Modify your mapper to not output the values you want to filter
  • Write a reducer that filters out the values you don't want, and feed the output of that reducer to another MapReduce job

Using a combiner is not a good choice for this task because, as @100gods mentions, combiner execution is not guaranteed.

Upvotes: 1

saurabh shashank
saurabh shashank

Reputation: 1353

Modifying the Mapper Class to filter the input will be more accurate , because , the execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.

Upvotes: 1

Related Questions