Hadoop: No such file or directory

Question

So I am brand new to Hadoop and the command line although I have done some programming before (as a student). I am trying to run a few simple programs (part of a tutorial) from Putty on the school machine.

I have gotten Hadoop commands to work before and run a different simple program just fine but I am stuck with this one. No, this is not homework. Just a tutorial to get to know the Hadoop commands.

Instructions say the following:

/*

Testing the Code

We perform local testing conforming to typical UNIX-style piping, our testing will take the form:

cat | map | sort | reduce Which emulates the same pipeline that Hadoop will perform when streaming, albeit in a non-distributed manner. You have to make sure that files mapper.py and reducer.py have execution permissions:

chmod u+x mapper.py chmod u+x reducer.py

Try the following command and explain the results (hint: type man sort in your terminal window to find out more about the sort command):

echo "this is a test and this should count the number of words" | ./mapper.py | sort -k1,1 | ./reducer.py

*/

Running "hdfs dfs -ls /user/$USER gives the following result:

Found 6 items drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/QuasiMonteCarlo_1445334654365_163883167 drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/data -rw-r--r-- 3 s1353460 s1353460 360 2015-10-20 12:13 /user/s1353460/mapper.py -rw-r--r-- 3 s1353460 s1353460 15346 2015-10-20 11:11 /user/s1353460/part-r-00000 -rw-r--r-- 2 s1353460 s1353460 728 2015-10-21 10:21 /user/s1353460/reducer.py drwxr-xr-x - s1353460 s1353460 0 2015-10-16 14:38 /user/s1353460/source

But running "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns errors:

-bash: /user/s1353460/reducer.py: No such file or directory -bash: /user/s1353460/mapper.py: No such file or directory

which seems odd since just above the were listed exactly with in that position. Any idea of what might be going on here?

OneCricketeer · Accepted Answer

Basically, using echo, you are testing your files locally and not touching HDFS at all. HDFS is a file system abstraction... but that's another topic.

If mapper.py or reducer.py are not in your current directory, you'll have your mentioned problem regardless if they are in HDFS at the same path.

To use your local python files with hadoop streaming, you need to use the streaming jar (its location depends on your installation) see this post here.

Hadoop: No such file or directory

Answers (2)

Related Questions