backtrack
backtrack

Reputation: 8144

hadoop - Map reduce on multiple cluster

I have configured Hadoop cluster . And im having two machines MA and MB When i run the mapreduce program using the following code

 hadoop  jar /HDP/hadoop-1.2.0.1.3.0.0-0380/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-0380.jar  -mapper "python C:\Python33\mapper.py"  -reducer "python C:\Python33\redu.py"  -input "/user/XXXX/input/input.txt"  -output "/user/XXXX/output/out20131112_09"

where : mapper - C:\Python33\mapper.py and reducer C:\Python33\redu.py is in MB's local disk

UPDATE enter image description here

Finally i have tracked down the error .

MA- error log

stderr logs
python: can't open file 'C:\Python33\mapper.py': [Errno 2] No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2

mapper - C:\Python33\mapper.py and reducer C:\Python33\redu.py is in MA's local disk and it is not in MB

Now , Do i need to copy my m/r program to MA or how shall i resolve this

Mapper

import sys
for line in sys.stdin:
   line = line.strip()
   keys = line.split()
   for key in keys:
       value = 1
       print( '%s \t %d' % (key, value))

Upvotes: 0

Views: 701

Answers (1)

Ion Cojocaru
Ion Cojocaru

Reputation: 2583

If the map input file is smaller than dfs.block.size then you will end with only one task per job running. For small inputs you can force Hadoop to run multiple tasks with mapred.max.split.size value in bytes being smaller than dfs.block.size.

Upvotes: 2

Related Questions