Reputation: 91
When adding a custom jar step for an EMR cluster - how do you set the classpath to a dependent jar (required library)?
Let's say I have my jar file - myjar.jar but I need an external jar to run it - dependency.jar. Where do you configure this when creating the cluster? I am not using the command line, using the Advanced Options interface.
Thought I would post this after spending a number of hours poking around and reading outdated documentation.
The 2.x/3.x documentation that talks about setting the HADOOP_CLASSPATH does not work. They specify this does not work for 4.x and above anyway. Somewhere you need to specify a --libjars option. However, specifying that in the arguments list does not work either.
For example: Step Name: MyCustomStep Jar Location: s3://somebucket/myjar.jar Arguments: myclassname option1 option2 --libjars dependentlib.jar
Upvotes: 1
Views: 1221
Reputation: 21
Copy your required jars to /usr/lib/hadoop-mapreduce/ in a bootstrap action. No other changes are necessary. Additional info below:
This command below works for me to copy a specific JDBC driver version:
sudo aws s3 cp s3://<your bucket>/mysql-connector-java-5.1.23-bin.jar /usr/lib/hadoop-mapreduce/
I have other dependencies so I have a bootstrap action for each jar I need copied, of course you could put all the copies in a single bash script. Below is .net code I use to get a bootstrap action to run the copy script. I am using .net SDK versions 3.3.* and launching the job with release label emr-5.2.0
public static BootstrapActionConfig CopyEmrJarDependency(string jarName)
{
return new BootstrapActionConfig()
{
Name = $"Copy jars for EMR dependency: {jarName}",
ScriptBootstrapAction = new ScriptBootstrapActionConfig()
{
Path = $"s3n://{Config.AwsS3CodeBucketName}/EMR/Scripts/copy-thirdPartyJar.sh",
Args = new List<string>()
{
$"s3://{Config.AwsS3CodeBucketName}/EMR/Java/lib/{jarName}",
"/usr/lib/hadoop-mapreduce/"
}
}
};
}
Note that the ScriptBootstrapActionConfig Path property uses the protocol "s3n://", but the protocol for the aws cp command should be "s3://"
My script copy-thirdPartyJar.sh contains the following:
#!/bin/bash
# $1 = location of jar
# $2 = attempted magic directory for java classpath
sudo aws s3 cp $1 $2
Upvotes: 2