Reputation: 746
I am troubleshooting how to sort my data multiple times without have to go back through the mapper each time.
Id like to setup: mapper 1 --> reducer 1 ---> reducer 2 ---> reducer 3
I want to make reducer 1 output (key, data) and then have it go straight to reducer 2...is this possible?
I have learned from troubleshooting that you can chain jobs, but this requires a mapper for each step?
Whenever I try to run without a mapper it ends with an error. It seems like running mapper for each step would be a waste of time/resources if I can just output it as needed from reducer 1.
Thoughts?
Upvotes: 4
Views: 871
Reputation: 1557
In short, if you are using Java, ChainReducer and ChainMapper are what you need. With these classes you can add arbitrary number of reducers or mappers in a chain in any order.
The book "Hadoop in Action" describes this procedure in chapter 5.
Upvotes: 1