Unable to access Hadoop hdfs file system through map reduce java program

Question

My Hadoop version is - 2.6.0 -cdh5.10.0 I am using a Cloudera Vm.

I am trying to access the hdfs file system through my code to access the files and add it as input or a cache file.

When I try to access the hdfs file through command line am able to list the files.

Command :

[cloudera@quickstart java]$ hadoop fs -ls hdfs://localhost:8020/user/cloudera 
Found 5items
-rw-r--r--   1 cloudera cloudera        106 2017-02-19 15:48 hdfs://localhost:8020/user/cloudera/test
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:42 hdfs://localhost:8020/user/cloudera/test_op
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:49 hdfs://localhost:8020/user/cloudera/test_op1
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:12 hdfs://localhost:8020/user/cloudera/wc_output
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:16 hdfs://localhost:8020/user/cloudera/wc_output1

When I try to access the same thing through my map reduce program,I am receiving File Not Found exception. My Map reduce sample configuration code is :

public int run(String[] args) throws Exception {
		
		Configuration conf = getConf();
		
		if (args.length != 2) {
			System.err.println("Usage: test  ");
			System.exit(2);
		}
		
		ConfigurationUtil.dumpConfigurations(conf, System.out);
		
		LOG.info("input: " + args[0] + " output: " + args[1]);
		
		Job job = Job.getInstance(conf);
		
		job.setJobName("test");
		
		job.setJarByClass(Driver.class);
		job.setMapperClass(Mapper.class);
		job.setReducerClass(Reducer.class);

		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(DoubleWritable.class);
		
		
		job.addCacheFile(new Path("hdfs://localhost:8020/user/cloudera/test/test.tsv").toUri());
		
		
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		
		boolean result = job.waitForCompletion(true);
		return (result) ? 0 : 1;
	}

The line job.addCacheFile in the above snippet returns FileNotFound Exception.

2)My second question is :

My entry at core-site.xml points to localhost:9000 for default hdfs file system URI.But at the command prompt am able to access the default hdfs file system only at port 8020 and not at 9000.when I tried using port 9000,I ended up with ConnectionRefused Exception. I am not sure from where the configurations are read.

My core-site.xml is as follows :







  
  
 
  fs.default.name
  hdfs://localhost:9000
  Default file system URI.  URI:scheme://authority/path scheme:method of access authority:host,port etc.

My hdfs-site.xml is as follows :








	
		dfs.name.dir
		/tmp/hdfs/name
		Determines where on the local filesystem the DFS name
			node should store the name table(fsimage).
	

	
		dfs.data.dir
		/tmp/hdfs/data
		Determines where on the local filesystem an DFS data node should store its blocks.
	
	
	
		dfs.replication
		1
		Default block replication.Usually 3, 1 in our case

I am receiving the following exception :

java.io.FileNotFoundException: hdfs:/localhost:8020/user/cloudera/test/   (No such file or directory)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.(FileInputStream.java:146)
  at java.io.FileInputStream.(FileInputStream.java:101)
  at java.io.FileReader.(FileReader.java:58)
  at hadoop.TestDriver$ActorWeightReducer.setup(TestDriver.java:104)
  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
  at        org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any help will be useful!

Unable to access Hadoop hdfs file system through map reduce java program

Answers (1)

Related Questions