Shailesh
Shailesh

Reputation: 403

Error in running Python script in Hadoop-Streaming application

I am running Hadoop sample application given in 'Hadoop in Action'by Chuck Lam on Win 7 notebook on Cygwin environment. Python is installed on Cygwin and sample python application running. When I run hadoop streaming application it is throwing following error. Following is command

"bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'RandomSample.py 10' -file RandomSample.py

RandomSample.py is simple application for filtering input.

The following error is thrown:

java.io.IOException: Cannot run program "C:\cygwin64\home\RajS1\hadoop-1.2.1\.\RandomSample.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:376)
        at java.lang.ProcessImpl.start(ProcessImpl.java:136)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)

And when I run following command then also it throws similar error. My guess streaming application should execute python application but it is trying to execute this as java application Please suggest solution. Thanks in advance

Upvotes: 0

Views: 1022

Answers (1)

hbr
hbr

Reputation: 469

you might want to try this option as well

bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'python RandomSample.py 10' -file RandomSample.py

If you try this, you might not get this error. But you might get access denied 'python RandomSample.py' , try to give the full path to the python exe. like

bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'c:\mfiles\python RandomSample.py 10' -file RandomSample.py

good luck

Upvotes: 2

Related Questions