Custom Writable class in hadoop for multiple double values

Question

I am trying to emit 4 numeric values as key. I wrote custom writable Comparable class for the same but I am stuck with compare() method there are several solutions mentioned in stackoverflow site. But that didn't solve my issue.

My writableCoparable class is

public class DimensionWritable implements WritableComparable {
    private double keyRow;
    private double keyCol;

    private double valRow;
    private double valCol;


    public  DimensionWritable(double keyRow, double keyCol,double valRow, double valCol) {
        set(keyRow, keyCol,valRow,valCol);
    }
    public void set(double keyRow, double keyCol,double valRow, double valCol) {
        //row dimension
        this.keyRow = keyRow;
        this.keyCol = keyCol;
        //column dimension
        this.valRow = valRow;
        this.valCol = valCol;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeDouble(keyRow);
        out.writeDouble(keyCol);

        out.writeDouble(valRow);
        out.writeDouble(valCol);
    }
    @Override
    public void readFields(DataInput in) throws IOException {
        keyRow = in.readDouble();
        keyCol = in.readDouble();

        valRow = in.readDouble();
        valCol = in.readDouble();
    }
    /**
     * @return the keyRow
     */
    public double getKeyRow() {
        return keyRow;
    }
    /**
     * @param keyRow the keyRow to set
     */
    public void setKeyRow(double keyRow) {
        this.keyRow = keyRow;
    }
    /**
     * @return the keyCol
     */
    public double getKeyCol() {
        return keyCol;
    }
    /**
     * @param keyCol the keyCol to set
     */
    public void setKeyCol(double keyCol) {
        this.keyCol = keyCol;
    }
    /**
     * @return the valRow
     */
    public double getValRow() {
        return valRow;
    }
    /**
     * @param valRow the valRow to set
     */
    public void setValRow(double valRow) {
        this.valRow = valRow;
    }
    /**
     * @return the valCol
     */
    public double getValCol() {
        return valCol;
    }
    /**
     * @param valCol the valCol to set
     */
    public void setValCol(double valCol) {
        this.valCol = valCol;
    }

    //compare - confusing

}

Excatly what is the logic behind the compare statement- it is the exchange of keys in Hadoop right?

How to implement same for the above 4 double values.

UPDATE I edited my code as "isnot2bad" said But showing

java.lang.Exception: java.lang.RuntimeException: java.lang.NoSuchMethodException: edu.am.bigdata.svmmodel.DimensionWritable.()
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException: edu.am.bigdata.svmmodel.DimensionWritable.()
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
    at org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java:113)
    at org.apache.hadoop.io.WritableComparator.(WritableComparator.java:99)
    at org.apache.hadoop.io.WritableComparator.get(WritableComparator.java:55)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:819)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:836)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:584)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NoSuchMethodException: edu.am.bigdata.svmmodel.DimensionWritable.()
    at java.lang.Class.getConstructor0(Class.java:2721)
    at java.lang.Class.getDeclaredConstructor(Class.java:2002)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)

Am I doing anything wrong?

isnot2bad · Accepted Answer

If you want to use your type as a key in Hadoop, it has to be comparable, (your type must be totally ordered), i.e. two instances a and b of DimensionWritable must be either equal, or a must be greater or less than b (whatever that means is up to the implementation).

By implementing compareTo you define how instances can be naturally compared to each other. This is done by comparing the fields of the instances to compare:

public int compareTo(DimensionWritable o) { 
    int c = Double.compare(this.keyRow, o.keyRow);
    if (c != 0) return c;
    c = Double.compare(this.keyCol, o.keyCol);
    if (c != 0) return c;
    c = Double.compare(this.valRow, o.valRow);
    if (c != 0) return c;
    c = Double.compare(this.valCol, o.valCol);
    return c;
}

Note, that hashCode must also be implemented, as it must be conform to your definition of equality (two instances that are considered equal according to compareTo should have the same hash code), and because Hadoop requires the hash code of a key to be constant across different JVMs. So again we use the fields to compute a hash code:

public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + Double.hashCode(keyRow);
    result = prime * result + Double.hashCode(keyCol);
    result = prime * result + Double.hashCode(valRow);
    result = prime * result + Double.hashCode(valCol);
    return result;
}

Custom Writable class in hadoop for multiple double values

Answers (1)

Related Questions