Reputation: 349
I am trying to implement one use case as given in Book Hadoop In Action, but I am not being to compile the code. I am new to Java so, not being able to understand the exact reasons behind the errors.
Interesting thing is, another piece of coding using same classes and methods are compiled successfully.
hadoop@hadoopnode1:~/hadoop-0.20.2/playground/src$ javac -classpath /home/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/hadoop/hadoop-0.20.2/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar -d ../classes DataJoin2.java
DataJoin2.java:49: cannot find symbol
symbol : constructor TaggedWritable(org.apache.hadoop.io.Text)
location: class DataJoin2.TaggedWritable
TaggedWritable retv = new TaggedWritable((Text) value);
^
DataJoin2.java:69: cannot find symbol
symbol : constructor TaggedWritable(org.apache.hadoop.io.Text)
location: class DataJoin2.TaggedWritable
TaggedWritable retv = new TaggedWritable(new Text(joinedStr));
^
DataJoin2.java:113: setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapreduce.Mapper>) in org.apache.hadoop.mapreduce.Job cannot be applied to (java.lang.Class<DataJoin2.MapClass>)
job.setMapperClass(MapClass.class);
^
DataJoin2.java:114: setReducerClass(java.lang.Class<? extends org.apache.hadoop.mapreduce.Reducer>) in org.apache.hadoop.mapreduce.Job cannot be applied to (java.lang.Class<DataJoin2.Reduce>)
job.setReducerClass(Reduce.class);
^
4 errors
----------------code----------------------
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
// DataJoin Classes
import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;
import org.apache.hadoop.contrib.utils.join.TaggedMapOutput;
import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
public class DataJoin2
{
public static class MapClass extends DataJoinMapperBase
{
protected Text generateInputTag(String inputFile)
{
String datasource = inputFile.split("-")[0];
return new Text(datasource);
}
protected Text generateGroupKey(TaggedMapOutput aRecord)
{
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(",");
String groupKey = tokens[0];
return new Text(groupKey);
}
protected TaggedMapOutput generateTaggedMapOutput(Object value)
{
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
} // End of class MapClass
public static class Reduce extends DataJoinReducerBase
{
protected TaggedMapOutput combine(Object[] tags, Object[] values)
{
if (tags.length < 2) return null;
String joinedStr = "";
for (int i=0;i<values.length;i++)
{
if (i>0) joinedStr += ",";
TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
String[] tokens = line.split(",",2);
joinedStr += tokens[1];
}
TaggedWritable retv = new TaggedWritable(new Text(joinedStr));
retv.setTag((Text) tags[0]);
return retv;
}
} // End of class Reduce
public static class TaggedWritable extends TaggedMapOutput
{
private Writable data;
public TaggedWritable()
{
this.tag = new Text("");
this.data = data;
}
public Writable getData()
{
return data;
}
public void write(DataOutput out) throws IOException
{
this.tag.write(out);
this.data.write(out);
}
public void readFields(DataInput in) throws IOException
{
this.tag.readFields(in);
this.data.readFields(in);
}
} // End of class TaggedWritable
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: DataJoin2 <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "DataJoin");
job.setJarByClass(DataJoin2.class);
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(TaggedWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Upvotes: 1
Views: 1594
Reputation: 97
I have hadoop-2,7,1, for me worked to add dependency from MAven, in the pom.xml
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-datajoin</artifactId>
<version>2.7.1</version>
</dependency>
This is the Url for hadoop-datajoin : https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-datajoin
Upvotes: 0
Reputation: 241601
For your first two error messages, the compiler errors are clearly telling you that you don't have a constructor for TaggedWritable
that accepts an argument of type Text
. It appears to me that you are making TaggedWritable
serve as a wrapper for Writable
to add a tag, so may I suggest adding constructor with:
public TaggedWritable(Writable data) {
this.tag = new Text("");
this.data = data;
}
In fact, as you've written it, this line
this.data = data;
just reassigns data
to itself, so I'm pretty sure you intended to have a constructor argument named data
. See my reasoning above for why I think you should make it Writable
instead of Text
. Since Text
implements Writable
, this will resolve your first two error messages.
However, you will need to keep a default no-arg constructor. This is because Hadoop will use reflection to instantiate an instance Writable
values as it serializes them across the network between the map reduce phases. I think you have a tiny bit of a mess here for the default no-arg constructor:
public TaggedWritable() {
this.tag = new Text("");
}
The reason that I see this as a mess is because if you don't assign to TaggedWritable.data
a valid instance of whatever your wrapped Writable
values are, you will get a NullPointerException
when this.data.readFields(in)
is invoked in TaggedWritable.readFields(DataInput)
. Since it's a general wrapper, you should probably make TaggedWritable
a generic type and then use reflection to assign to TaggedWritable.data
in the default no-arg constructor.
For your last two compiler errors, to use hadoop-datajoin
I note that you need to be using the old API classes. Thus, all of these
org.apache.hadoop.mapreduce.Job;
org.apache.hadoop.mapreduce.Mapper;
org.apache.hadoop.mapreduce.Reducer;
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
should be replaced by their old API equivalents. So org.apache.hadoop.mapred.JobConf
instead of org.apache.hadoop.mapreduce.Job
, etc. That will handle your last two error messages.
Upvotes: 1
Reputation: 5239
There is nothing ambiguous about the error message. It is telling you that you did not provide a constructor for TaggedWritable
which takes an argument of type Text
. You only show a no-arg constructor in the code you posted.
Upvotes: 1