Reputation: 15
So I am brand new to Hadoop and the command line although I have done some programming before (as a student). I am trying to run a few simple programs (part of a tutorial) from Putty on the school machine.
I have gotten Hadoop commands to work before and run a different simple program just fine but I am stuck with this one. No, this is not homework. Just a tutorial to get to know the Hadoop commands.
Instructions say the following:
/*
Testing the Code
We perform local testing conforming to typical UNIX-style piping, our testing will take the form:
cat | map | sort | reduce Which emulates the same pipeline that Hadoop will perform when streaming, albeit in a non-distributed manner. You have to make sure that files mapper.py and reducer.py have execution permissions:
chmod u+x mapper.py chmod u+x reducer.py
Try the following command and explain the results (hint: type man sort in your terminal window to find out more about the sort command):
echo "this is a test and this should count the number of words" | ./mapper.py | sort -k1,1 | ./reducer.py
*/
Running "hdfs dfs -ls /user/$USER gives the following result:
Found 6 items drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/QuasiMonteCarlo_1445334654365_163883167 drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/data -rw-r--r-- 3 s1353460 s1353460 360 2015-10-20 12:13 /user/s1353460/mapper.py -rw-r--r-- 3 s1353460 s1353460 15346 2015-10-20 11:11 /user/s1353460/part-r-00000 -rw-r--r-- 2 s1353460 s1353460 728 2015-10-21 10:21 /user/s1353460/reducer.py drwxr-xr-x - s1353460 s1353460 0 2015-10-16 14:38 /user/s1353460/source
But running "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns errors:
-bash: /user/s1353460/reducer.py: No such file or directory -bash: /user/s1353460/mapper.py: No such file or directory
which seems odd since just above the were listed exactly with in that position. Any idea of what might be going on here?
Upvotes: 0
Views: 2768
Reputation: 2984
But running "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns errors:
-bash: /user/s1353460/reducer.py: No such file or directory -bash: /user/s1353460/mapper.py: No such file or directory
You have created mapper.py
& reducer.py
on HDFS. When you run this command, it searches mapper.py
and reducer.py
on your local file system not on HDFS
.
To fix this issue:
Ensure /user/s1353460/ exist on your local file system
. if it doesn't, create the same and then copy or create mapper.py
& reducer.py
in /user/s1353460/
Make sure the mapper.py
has execution permission chmod +x /user/s1353460/mapper.py
Make sure the reducer.py
has execution permission chmod +x /user/s1353460/reducer.py
Run echo "this is a test and this should count the number of words" | /user/s1353460/mapper.py | sort -k1,1 | /user/s1353460/reducer.py
It should work this time without any error.
To run Python MapReduce job on the Hadoop cluster:
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \
-file /user/s1353460/mapper.py -mapper /user/s1353460/mapper.py \
-file /user/s1353460/reducer.py -reducer /user/s1353460/reducer.py \
-input <hdfs-input-path> -output <hdfs-output-path>
Assumption: Hadoop
is installed in /usr/local/hadoop
. Change the path appropriately.
Upvotes: 1
Reputation: 191701
Basically, using echo, you are testing your files locally and not touching HDFS at all. HDFS is a file system abstraction... but that's another topic.
If mapper.py or reducer.py are not in your current directory, you'll have your mentioned problem regardless if they are in HDFS at the same path.
To use your local python files with hadoop streaming, you need to use the streaming jar (its location depends on your installation) see this post here.
Upvotes: 0