Reputation: 971

Hadoop missing input which is present in HDFS

Evening All,

I'm trying to run a training sample on Hadoop mapreduce, but am receiving an error that the input path does not exist.

16/09/26 05:56:45 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: hdfs://bigtop1.vagrant:8020/training

However, looking inside the hdfs directory, it's clear that the "training" folder is present.

[vagrant@bigtop1 code]$ hadoop fs -ls
Found 3 items
drwx------   - vagrant hadoop          0 2016-09-26 05:47 .staging
drwxr-xr-x   - vagrant hadoop          0 2016-09-26 04:28 hw2
drwxr-xr-x   - vagrant hadoop          0 2016-09-26 04:14 training

Using HDFS commands:

[vagrant@bigtop1 code]$ hdfs dfs -ls training
Found 2 items
-rw-r--r--   3 vagrant hadoop          0 2016-09-26 04:14 training/_SUCCESS
-rw-r--r--   3 vagrant hadoop    3311720 2016-09-26 04:14 training/part-r-00000

Does anyone know of a possible reason that Hadoop would be missing data that is clearly present?

Invocation Below, had to hide one input (-f):

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D mapreduce.job.reduces=5 -files lr -mapper "python lr/mapper.py -n 5 -r 0.4" -reducer "python lr/reducer.py -e 0.1 -c 0.0 -f ####" -input /training/ -output /models

Upvotes: 1

Answers (3)

Bhavesh

Reputation: 919

Please change the input parameter as something like this.

From

-input /training/

-input training/

Upvotes: 1

Sarath Sasikumar

Reputation: 123

Please change the input parameter as something like this.

-input hdfs://<machinename>/user/vagrant/training/

Upvotes: 0

Binary Nerd

Reputation: 13927

When you run $ hadoop fs -ls it shows you the data in the current users home directory.

Are you sure the path to your data isnt /user/vagrant/?

If the training directory isn't present when you run $ hadoop fs -ls / then you have the path wrong.

Upvotes: 0

Hadoop missing input which is present in HDFS

Answers (3)

Related Questions