user1179295
user1179295

Reputation: 746

Hadoop Streaming and multiple reducer steps without a mapper between each step

I am troubleshooting how to sort my data multiple times without have to go back through the mapper each time.

Id like to setup: mapper 1 --> reducer 1 ---> reducer 2 ---> reducer 3

I want to make reducer 1 output (key, data) and then have it go straight to reducer 2...is this possible?

I have learned from troubleshooting that you can chain jobs, but this requires a mapper for each step?

Whenever I try to run without a mapper it ends with an error. It seems like running mapper for each step would be a waste of time/resources if I can just output it as needed from reducer 1.

Thoughts?

Upvotes: 4

Views: 871

Answers (1)

vpap
vpap

Reputation: 1557

In short, if you are using Java, ChainReducer and ChainMapper are what you need. With these classes you can add arbitrary number of reducers or mappers in a chain in any order.

The book "Hadoop in Action" describes this procedure in chapter 5.

Upvotes: 1

Related Questions