Reputation: 6343
I have a C++ service, which exposes 2 interfaces:
a. Submit(): For submitting a DistCp job to YARNRM
b. Query(): For querying the status of the application.
This service internally calls a Java client (through JNI), which has 2 static functions:
Submit()
Query()
Submit() does:
DistCp distCp = new DistCp(configuration, distCpOptions);
Job job = distCp.execute();
Parses the "application ID" from the tracking URL and returns it.
Query() does:
Takes "application ID" returned in Submit()
YarnClient yarnClient = YarnClient.createYarnClient();
yarnClient.init(new YarnConfiguration());
yarnClient.start();
yarnClient.getApplicationReport(applicationID);
yarnClient.stop();
The problem I am facing is,
Query() calls succeed under all the conditions.
The Submit() calls fail with errors (1st call, 2nd call and 3rd call below, with different exceptions):
java.util.ServiceConfigurationError: org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider: Provider org.apache.hadoop.mapred.LocalClientProtocolProvider not found
java.util.ServiceConfigurationError: org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider: Provider org.apache.hadoop.mapred.YarnClientProtocolProvider not found
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
I debugged the issue and figured out that, when Query() API is called first, then classes LocalClientProtocolProvider
and YarnClientProtocolProvider
are not loaded. The class loader should load these classes, when Submit() is called. But, that is not happening.
I also observed that, when the Query() API is called first, the Hadoop configuration gets changed and contains lot many default settings related to "mapreduce.*" configuration.
I tried explicit loading using Class.forName(), as soon as the Submit() method is called. But, that did not help either.
When Submit() is called, why does not the class loader load the required classes? Is this the problem with Hadoop configuration or Java class loader? Or is it the problem because I am mixing MapReduce and Yarn APIs?
"mapreduce.framework.name" configuration is set to "yarn".
My environment is Hadoop 2.6.0.
My classpath contains, all the Hadoop jars present in following paths:
a. hadoop/common/
b. hadoop/common/lib
c. hadoop/hdfs/
d. hadoop/hdfs/lib
e. hadoop/mapreduce/
f. hadoop/mapreduce/lib
g. hadoop/yarn/
h. hadoop/yarn/lib
Upvotes: 0
Views: 1100
Reputation: 6343
I figured out that, I am mixing Yarn
and MapReduce
APIs and that is causing class loading problems.
When Query()
is called first, it loads all YARN related classes.
For e.g.:
org.apache.hadoop.yarn.client.api.YarnClient from file:/D:/data/hadoop-2
.6.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-client-2.6.0-SNAPSHOT.jar
But, the MapReduce related classes are not loaded. For e.g., following class is not loaded:
org.apache.hadoop.mapred.YarnClientProtocolProvider from
file:/D:/data/hdoop-2.6.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-SNAPSHOT.jar
So, when the Submit()
is called, the class loader assumes that, it has loaded all the required classes. But, classes YarnClientProtocolProvider
and LocalClientProtocolProvider
are not loaded yet. Hence, the Submit()
call fails.
To force the class loader to load all the MapReduce
related classes, I added following statements in the constructor for YarnClientWrapper
(which is a singleton class and wraps YarnClient
).
Cluster cluster = new Cluster(configuration);
cluster.getFileSystem();
cluster.close();
This resolved the issue.
But, cleaner implementation would be to use MapReduce
client in Query()
instead of YarnClient
. This will ensure that, we will not get into class loading issues.
Upvotes: 1