Reputation: 309
i'm trying to start mapreduce job in java code and submit the job to yarn. but got the following error:
2018-08-26 00:46:26,075 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-08-26 00:46:27,526 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager at hdcluster01/10.211.55.22:8032
2018-08-26 00:46:28,135 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-08-26 00:46:28,217 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(280)) - Total input paths to process : 1
2018-08-26 00:46:28,254 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2018-08-26 00:46:28,364 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_1535213323614_0008
2018-08-26 00:46:28,484 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(204)) - Submitted application application_1535213323614_0008
2018-08-26 00:46:28,506 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://hdcluster01:8088/proxy/application_1535213323614_0008/
2018-08-26 00:46:28,506 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_1535213323614_0008
2018-08-26 00:46:32,536 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_1535213323614_0008 running in uber mode : false
2018-08-26 00:46:32,537 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 0% reduce 0%
2018-08-26 00:46:32,547 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1535213323614_0008 failed with state FAILED due to: Application application_1535213323614_0008 failed 2 times due to AM Container for appattempt_1535213323614_0008_000002 exited with exitCode: -1000 due to: File file:/tmp/hadoop-yarn/staging/nasuf/.staging/job_1535213323614_0008/job.jar does not exist
.Failing this attempt.. Failing the application.
2018-08-26 00:46:32,570 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0
The error:
Job job_1535213323614_0008 failed with state FAILED due to: Application application_1535213323614_0008 failed 2 times due to AM Container for appattempt_1535213323614_0008_000002 exited with exitCode: -1000 due to: File file:/tmp/hadoop-yarn/staging/nasuf/.staging/job_1535213323614_0008/job.jar does not exist
.Failing this attempt.. Failing the application.
I can't figure out why I got this error. I can run successfully the jar file in command line but failed in java code. And I checked the path, the path /tmp/hadoop-yarn/ even doesn't exist. And the local user is nasuf, the user running hadoop is parallels, not the same one. And local OS is MacOS, hadoop running in Centos7.
The mapper code as below:
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] words = StringUtils.split(line, " ");
for (String word: words) {
context.write(new Text(word), new LongWritable(1));
}
}
}
and reducer code as below:
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long count = 0;
for (LongWritable value: values) {
count += value.get();
}
context.write(key, new LongWritable(count));
}
}
and runner code as below:
public class WCRunner {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("mapreduce.job.jar", "wc.jar");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.hostname", "hdcluster01");
conf.set("yarn.nodemanager.aux-services", "mapreduce_shuffle");
Job job = Job.getInstance(conf);
job.setJarByClass(WCRunner.class);
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
FileInputFormat.setInputPaths(job, new Path("hdfs://hdcluster01:9000/wc/srcdata"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://hdcluster01:9000/wc/output3"));
job.waitForCompletion(true);
}
}
Is there any guys can help on this? many thanks!
Upvotes: 0
Views: 653
Reputation: 309
I've resolved this problem. Just put the core-site.xml to the classpath or add the following config in the code:
conf.set("hadoop.tmp.dir", "/home/parallels/app/hadoop-2.4.1/data/");
Upvotes: 1