Isaac Lewis
Isaac Lewis

Reputation: 43

Hadoop MultipleOutputs.addNamedOutput throws "cannot find symbol"

I'm using Hadoop 0.20.203.0. I want to output to two different files, so I'm trying to get MultipleOutputs working.

Here's my configuration method:

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
  System.err.println("Usage: indycascade <in> <out>");
  System.exit(2);
}
Job job = new Job(conf, "indy cascade");
job.setJarByClass(IndyCascade.class);
job.setMapperClass(ICMapper.class);
job.setCombinerClass(ICReducer.class);
job.setReducerClass(ICReducer.class);

TextInputFormat.addInputPath(job, new Path(otherArgs[0]));
TextOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

MultipleOutputs.addNamedOutput(conf, "sql", TextOutputFormat.class, LongWritable.class, Text.class);

job.waitForCompletion(true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

However, this won't compile. The offending line is MultipleOutputs.addNamedOutput(...), which throws a "cannot find symbol" error.

isaac/me/saac/i/IndyCascade.java:94: cannot find symbol
symbol  : method addNamedOutput(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.Class<org.apa che.hadoop.mapreduce.lib.output.TextOutputFormat>,java.lang.Class<org.apache.hadoop.io.LongWritable>,java.lang.Class<org.apache.hadoop.io.Text>)
location: class org.apache.hadoop.mapred.lib.MultipleOutputs
    MultipleOutputs.addNamedOutput(conf, "sql", TextOutputFormat.class, LongWritable.class, Text.class);

Of course, I tried using a JobConf instead of Configuration, as the API demands, but that leads to the same error. Additionally, JobConf is deprecated.

How do I get MultipleOutputs to work? Is that even the correct class to use?

Upvotes: 0

Views: 2958

Answers (1)

Chris White
Chris White

Reputation: 30089

You're mixing old and new API types:

You're using the old API org.apache.hadoop.mapred.lib.MultipleOutputs:

location: class org.apache.hadoop.mapred.lib.MultipleOutputs

With the new API org.apache.hadoop.mapreduce.lib.output.TextOutputFormat:

symbol  : method addNamedOutput(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.Class<org.apa che.hadoop.mapreduce.lib.output.TextOutputFormat>,java.lang.Class<org.apache.hadoop.io.LongWritable>,java.lang.Class<org.apache.hadoop.io.Text>)

Make the APIs consistent and you should be ok

Edit: Infact 0.20.203 doesn't have a port of MultipleOutputs for the new API, so you'll have to use the old api, find a new API port online Cloudera- 0.20.2+320), or port it yourself

Also, you should look at the ToolRunner class to execute your jobs, it will remove the need to explicitly call the GenericOptionsParser:

public static class Driver extends Configured implements Tool {
  public static void main(String[] args) throws Exception {
    System.exit(ToolRunner.run(new Driver(), args));
  }

  public int run(String args[]) {
    if (args.length != 2) {
      System.err.println("Usage: indycascade <in> <out>");
      System.exit(2);
    }

    Job job = new Job(getConf());
    Configuration conf = job.getConfiguration();

    // insert other job set up here

    return job.waitForCompletion(true) ? 0 : 1;
  }
}

Final point - any reference to conf after you create the Job instance will be the original conf. Job makes a deep copy of the conf object, so calling MultipleOutputs.addNamedoutput(conf, ...) will not have the desired effect, use MultipleOutputs.addNamedoutput(job.getConfiguration(), ...) instead. See my example code above for the correct way to do this

Upvotes: 4

Related Questions