Defining data types in hadoop

Question

I used Java to write program of Hadoop, and I want to define my own data type in Java program. Here is the reference

And here is my code:

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;
import java.util.HashMap;

import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput; 
import java.io.DataOutput; 
import java.io.IOException; 
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Co {

    public class Middle implements WritableComparable {

        public int number;
        public String page;

        Middle() {
            number = -1;
            page = "";
        }

        public void write(DataOutput out) throws IOException {
            out.writeInt(number);
            out.writeUTF(page);
        }

        public void readFields(DataInput in) throws IOException {
            number = in.readInt();
            page = in.readUTF();
        }

        public int compareTo(Middle o) {
            int thisValue = this.value;
            int thatValue = o.value;
            return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
        }

    }

    public static class TokenizerMapper extends Mapper {
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            Middle temp = new Middle();
            temp.page = "1";
            temp.number = 1;
            context.write(new Text("A"), temp);
        }
    }

    public static class IntSumReducer extends Reducer {

        public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

            context.write(new Text("A"), new DoubleWritable(0.0));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(Co.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(Mycombiner.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Middle.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

However, as I just completely copy the code in reference I got problems when a compile the program, and here is the mistakes:

The information is here

So, can anyone please tell me how should I do? Thanks!

Davis Broda · Accepted Answer

In order of their appearance in the terminal

WritableComparable needs to have a type specified. Try the below change in the class declaration, and see if it works.

public class Middle implements WritableComparable {

If that doesn't work, try making the compareTo method take Object as an argument instead of Middle

for the first cannot find symbol: This occurs because there is no field called value in the object. I suspect that this should be this.number instead of this.value

Similar thing for the second cannot find symbol error

The final error is due to the fact that your Middle class is defined as a non-static nested class (inner class). Inner classes cannot be instantiated unless they are already inside of an instance of the outer class. However you never instantiate the Co class, making it impossible to instantiate the Middle class. Try either moving middle to another class, or instantiating Co in the main method, and having the rest of the code take place in a non-static run(String[] args) method. See Oracle's explaination of nested classes for more information

If you want a better example of a WritableComparable class, check here

Defining data types in hadoop

Answers (1)

Related Questions