Vitaliy Plokhovskyy
Vitaliy Plokhovskyy

Reputation: 11

How to execute multiple reduce jobs with one mapper using bash file in Hadoop using Python as the base?

bash file code

I formatted the mapper and the reducer to be the same so I can skip the mapping steps and just continue to reduce it. IN this case I am only doing two reduce jobs. It works fine using Unix piping commands. However, I need this to work using a bash file.

I've tried listing it twice:

    -reducer "python3 reducer.py"\
    -reducer "python3 reducer.py"

and tried piping inside the bash

  -reducer "python3 reducer.py | python3 reducer.py"

I also tried some other combinations. Some broke period, some created wrong output. I feel like there is a solution out there but can't get it do work... I am using MobaXterm to work in Hadoop.

piping commands

Upvotes: 0

Views: 103

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191884

No, Hadoop Streaming cannot do pipes. If you want to run the reducer output through itself, then collect the data within the file into a list rather than printing, and use a loop to re-process the data.

Alternatively, use a MapReduce Combiner.

Upvotes: 0

Related Questions