Hadoop: No such file or directory

So I am brand new to Hadoop and the command line although I have done some programming before (as a student). I am trying to run a few simple programs (part of a tutorial) from Putty on the school machine.

I have gotten Hadoop commands to work before and run a different simple program just fine but I am stuck with this one. No, this is not homework. Just a tutorial to get to know the Hadoop commands.

Instructions say the following:

/*

Testing the Code

We perform local testing conforming to typical UNIX-style piping, our testing will take the form:

cat | map | sort | reduce Which emulates the same pipeline that Hadoop will perform when streaming, albeit in a non-distributed manner. You have to make sure that files mapper.py and reducer.py have execution permissions:

chmod u+x mapper.py chmod u+x reducer.py

Try the following command and explain the results (hint: type man sort in your terminal window to find out more about the sort command):

echo "this is a test and this should count the number of words" | ./mapper.py | sort -k1,1 | ./reducer.py

*/

Running "hdfs dfs -ls /user/$USER gives the following result:

Found 6 items drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/QuasiMonteCarlo_1445334654365_163883167 drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/data -rw-r--r-- 3 s1353460 s1353460 360 2015-10-20 12:13 /user/s1353460/mapper.py -rw-r--r-- 3 s1353460 s1353460 15346 2015-10-20 11:11 /user/s1353460/part-r-00000 -rw-r--r-- 2 s1353460 s1353460 728 2015-10-21 10:21 /user/s1353460/reducer.py drwxr-xr-x - s1353460 s1353460 0 2015-10-16 14:38 /user/s1353460/source

But running "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns errors:

-bash: /user/s1353460/reducer.py: No such file or directory -bash: /user/s1353460/mapper.py: No such file or directory

which seems odd since just above the were listed exactly with in that position. Any idea of what might be going on here?

Upvotes: 0

Views: 2768

Answers (2)

Vinkal
Vinkal

Reputation: 2984

But running "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns errors:

-bash: /user/s1353460/reducer.py: No such file or directory -bash: /user/s1353460/mapper.py: No such file or directory

You have created mapper.py & reducer.py on HDFS. When you run this command, it searches mapper.py and reducer.py on your local file system not on HDFS.

To fix this issue:

  1. Ensure /user/s1353460/ exist on your local file system. if it doesn't, create the same and then copy or create mapper.py & reducer.py in /user/s1353460/

  2. Make sure the mapper.py has execution permission chmod +x /user/s1353460/mapper.py

  3. Make sure the reducer.py has execution permission chmod +x /user/s1353460/reducer.py

  4. Run echo "this is a test and this should count the number of words" | /user/s1353460/mapper.py | sort -k1,1 | /user/s1353460/reducer.py It should work this time without any error.

To run Python MapReduce job on the Hadoop cluster:

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \
-file /user/s1353460/mapper.py    -mapper /user/s1353460/mapper.py \
-file /user/s1353460/reducer.py   -reducer /user/s1353460/reducer.py \
-input <hdfs-input-path> -output <hdfs-output-path>

Assumption: Hadoop is installed in /usr/local/hadoop. Change the path appropriately.

Upvotes: 1

OneCricketeer
OneCricketeer

Reputation: 191701

Basically, using echo, you are testing your files locally and not touching HDFS at all. HDFS is a file system abstraction... but that's another topic.

If mapper.py or reducer.py are not in your current directory, you'll have your mentioned problem regardless if they are in HDFS at the same path.

To use your local python files with hadoop streaming, you need to use the streaming jar (its location depends on your installation) see this post here.

Upvotes: 0

Related Questions