Kasa
Kasa

Reputation: 189

Use application configuration over Hadoop configuration

I have built a Java application using Maven. It is packaged as an executable jar using the Maven Shade plugin. This application does several things - one of those is to upload data to a Hadoop cluster. I execute the program using the following:

$ hadoop jar <app_name>.jar <app_arg1> <app_arg2> ...

My application uses SLF4J with the Log4J bindings for logging - and so does Hadoop.

When using the hadoop jar command, Hadoop's own Log4J configuration file overrides my application's Log4J configuration file.

How can I prevent my application's Log4J configuration file from being overriden?

NOTES:

EDIT 1: (10/02/2015)

I've done a few things.

First, I changed the name of my Log4J configuration file to avoid the name collision with the the default log4j.properties file that Hadoop uses:

log4j-<app_name>.properties

Second, I set the the HADOOP_OPTS environment variable to tell Log4J what the name of my configuration file would be:

HADOOP_OPTS=-Dlog4j.configurationi=log4j-<app_name>.properties

Third, I set the HADOOP_CLASSPATH environment variable to ensure my configuration file that is packaged within the uber jar is picked up by the hadoop jar command:

HADOOP_CLASSPATH=/absolute/path/to/<app_name>.jar

With these changes, my application now uses it's own Log4J configuration file as intended. Feels like a hack (as I would have preferred to use the java -jar command), but it resolved my issue.

Upvotes: 3

Views: 1568

Answers (1)

YoungHobbit
YoungHobbit

Reputation: 13402

By default Hadoop framework jars appear before the users’ jars in the classpath. You can set the preference for your (user) jars using -Dmapreduce.job.user.classpath.first=true parameter in the command. The new command will look below.

hadoop jar <app_name>.jar -Dmapreduce.job.user.classpath.first=true <<app_arg1>> <<app_arg2>> ...

Or You can put the below configuration in your mapred-site.xml for always giving preference to user classpath.

<property>
    <name>mapreduce.job.user.classpath.first</name>
    <value>true</value>
</property>

You can set this programmatically in the job configuration.

job.getConfiguration().set("mapreduce.job.user.classpath.first", "true");

You can set this via any way, it will never be late.

Upvotes: 1

Related Questions