user253202
user253202

Reputation:

How to distribute jar to hadoop before Job submission

I want to implement REST API to submit Hadoop JOBs for execution. This is done purely via Java code. If I compile a jar file and execute it via "hadoop -jar" everything works as expected. But when I submit Hadoop Job via Java code in my REST API - job is submitted but fails because of ClassNotFoundException. Is it possible to deploy somehow jar file(with the code of my Jobs) to hadoop(nodemanagers and their's containers) so that hadoop will be able to locate jar file by class name?? Should I copy jar file to each nodemanager and set HADOOP_CLASSPATH there?

Upvotes: 2

Views: 1812

Answers (1)

SelimN
SelimN

Reputation: 212

You can create a method that adds the jar file to the distributed cache of Hadoop, so it will be available to tasktrakers when needed.

private static void addJarToDistributedCache(
    String jarPath, Configuration conf)
throws IOException {


File jarFile = new File(jarPath);

// Declare new HDFS location
Path hdfsJar = new Path(jarFile.getName());

// Mount HDFS
FileSystem hdfs = FileSystem.get(conf);

// Copy (override) jar file to HDFS
hdfs.copyFromLocalFile(false, true,
    new Path(jar), hdfsJar);

// Add jar to distributed classPath
DistributedCache.addFileToClassPath(hdfsJar, conf);
}

and then in your application, before submitting your job call addJarToDistributedCache:

public static void main(String[] args) throws Exception {

// Create Hadoop configuration
Configuration conf = new Configuration();

// Add 3rd-party libraries
addJarToDistributedCache("/tmp/hadoop_app/file.jar", conf);


// Create my job
Job job = new Job(conf, "Hadoop-classpath");
.../...
}

you can find more details in this blog:

Upvotes: 1

Related Questions