Reputation: 11
I formatted the mapper and the reducer to be the same so I can skip the mapping steps and just continue to reduce it. IN this case I am only doing two reduce jobs. It works fine using Unix piping commands. However, I need this to work using a bash file.
I've tried listing it twice:
-reducer "python3 reducer.py"\
-reducer "python3 reducer.py"
and tried piping inside the bash
-reducer "python3 reducer.py | python3 reducer.py"
I also tried some other combinations. Some broke period, some created wrong output. I feel like there is a solution out there but can't get it do work... I am using MobaXterm to work in Hadoop.
Upvotes: 0
Views: 103
Reputation: 191884
No, Hadoop Streaming cannot do pipes. If you want to run the reducer output through itself, then collect the data within the file into a list rather than printing, and use a loop to re-process the data.
Alternatively, use a MapReduce Combiner
.
Upvotes: 0