Alex Woolford
Alex Woolford

Reputation: 4563

pass Hadoop arguments into Java code

I have an Uber jar that performs some Cascading ETL tasks. The jar is executed like this:

hadoop jar munge-data.jar

I'd like to pass arguments to the jar when the job is launched, e.g.

hadoop jar munge-data.jar -Denv=prod

Different credentials, hostnames, etc... will be read from properties files depending on the environment.

This would work if the job were executed java jar munge-data.jar -Denv=prod, since the env property could be accessed:

System.getProperty("env")

However, this doesn't work when the jar is executed hadoop jar ....

I saw a similar thread where the answerer states that properties can be accessed using what looks like the org.apache.hadoop.conf.Configuration class. It wasn't clear to me, from the answer, how the conf object gets created. I tried the following and it returned null:

Configuration configuration = new Configuration();
System.out.println(configuration.get("env"));

Presumably, the configuration properties need to be read/set.

Can you tell me how I can pass properties, e.g. hadoop jar [...] -DsomeProperty=someValue, into my ETL job?

Upvotes: 5

Views: 4698

Answers (2)

maxteneff
maxteneff

Reputation: 1531

Driver class should implement Tool interface which allow you to use ToolRunner to run your MapReduce job:

public class MRDriver extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        /*...*/
    }
}

Then you'll be able to run jobs by following way:

public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new MRDriver(), args);
    System.exit(res);
}

It means that all your commannd line parameters parsed by ToolRunner to the current instance of Configuration class.

Assuming you run job from console with following command:

hadoop jar munge-data.jar -Denv1=prod1 -Denv2=prod2

Then in run() method you can get all your arguments from Configuration class:

public int run(String args[]) {
    Configuration conf = getConf();

    String env1 = conf.get("env1");
    String env2 = conf.get("env2");

    Job job = new Job(conf, "MR Job");
    job.setJarByClass(MRDriver.class);

    /*...*/
}

Upvotes: 5

Vignesh I
Vignesh I

Reputation: 2221

You can pass the arguments in two ways. Either using -D option or using configuration. But you can only use -D option when you implement Tool interface. If not then you have to set the configuration variables by conf.set.

Passing parameters using -D:

hadoop jar example.jar com.example.driver -D property=value /input/path /output/path

Passing parameters using Configuration:

Configuration conf=new Configuration();
conf.set("property","value");
Job job=new Job(conf);

Note: All the configuration variables have to be set before initializing Job class

Upvotes: 6

Related Questions