Reputation: 8144
I have configured Hadoop cluster . And im having two machines MA and MB When i run the mapreduce program using the following code
hadoop jar /HDP/hadoop-1.2.0.1.3.0.0-0380/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-0380.jar -mapper "python C:\Python33\mapper.py" -reducer "python C:\Python33\redu.py" -input "/user/XXXX/input/input.txt" -output "/user/XXXX/output/out20131112_09"
where : mapper - C:\Python33\mapper.py and reducer C:\Python33\redu.py is in MB's local disk
UPDATE
Finally i have tracked down the error .
MA- error log
stderr logs
python: can't open file 'C:\Python33\mapper.py': [Errno 2] No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
mapper - C:\Python33\mapper.py and reducer C:\Python33\redu.py is in MA's local disk and it is not in MB
Now , Do i need to copy my m/r program to MA or how shall i resolve this
Mapper
import sys
for line in sys.stdin:
line = line.strip()
keys = line.split()
for key in keys:
value = 1
print( '%s \t %d' % (key, value))
Upvotes: 0
Views: 701
Reputation: 2583
If the map input file is smaller than dfs.block.size
then you will end with only one task per job running. For small inputs you can force Hadoop to run multiple tasks with mapred.max.split.size
value in bytes being smaller than dfs.block.size
.
Upvotes: 2