hadoop - Map reduce on multiple cluster

Question

I have configured Hadoop cluster . And im having two machines MA and MB When i run the mapreduce program using the following code

 hadoop  jar /HDP/hadoop-1.2.0.1.3.0.0-0380/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-0380.jar  -mapper "python C:\Python33\mapper.py"  -reducer "python C:\Python33
edu.py"  -input "/user/XXXX/input/input.txt"  -output "/user/XXXX/output/out20131112_09"

where : mapper - C:\Python33\mapper.py and reducer C:\Python33 edu.py is in MB's local disk

UPDATE enter image description here

Finally i have tracked down the error .

MA- error log

stderr logs
python: can't open file 'C:\Python33\mapper.py': [Errno 2] No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2

mapper - C:\Python33\mapper.py and reducer C:\Python33 edu.py is in MA's local disk and it is not in MB

Now , Do i need to copy my m/r program to MA or how shall i resolve this

Mapper

import sys
for line in sys.stdin:
   line = line.strip()
   keys = line.split()
   for key in keys:
       value = 1
       print( '%s 	 %d' % (key, value))

Ion Cojocaru · Accepted Answer

If the map input file is smaller than dfs.block.size then you will end with only one task per job running. For small inputs you can force Hadoop to run multiple tasks with mapred.max.split.size value in bytes being smaller than dfs.block.size.

hadoop - Map reduce on multiple cluster

Answers (1)

Related Questions