Reputation: 33
I am a newbie at using Hadoop streaming with Python. I was successfully able to run the wordcount example explained in most of the references. But when I started with one of my own written small python script, it is showing errors, even though the functionality of code is close to nothing.
The error part on executing the command was :
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/12/13 01:47:31 INFO mapred.LocalJobRunner: map task executor complete.
14/12/13 01:47:31 WARN mapred.LocalJobRunner: job_local174189774_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/12/13 01:47:32 INFO mapreduce.Job: Job job_local174189774_0001 failed with state FAILED due to: NA
14/12/13 01:47:32 INFO mapreduce.Job: Counters: 0
14/12/13 01:47:32 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
The map.py file is as follows:
import sys
for line in sys.stdin:
line = line.strip()
review_lines = line.split('\n')
for r in review_lines:
review = r.split('\t')
print '%s\t%s' % (review[0], review[1])
The red.py file is as follows:
import sys
for line in sys.stdin:
line = line.strip()
word = line.split('\t')
print '%s\t%d' %(word[0], int(word[1]) % 2)
The input which I provided was : (input_file.txt)
R1 1
R2 5
R3 3
R4 2
The command used to run the process was :
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar -file /home/hduser/map.py -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py -input /user/hduser/input_file.txt -output /user/hduser/output_file.txt
Upvotes: 0
Views: 3020
Reputation: 468
Can you try putting this at the top of your scripts:
#!/usr/bin/env python
Upvotes: 3