Reputation: 1610
Object should implement Writable
interface in order to be serialized when transmitted in Hadoop. Take the Lucene ScoreDoc
class as an example:
public class ScoreDoc implements java.io.Serializable {
/** The score of this document for the query. */
public float score;
/** Expert: A hit document's number.
* @see Searcher#doc(int) */
public int doc;
/** Only set by {@link TopDocs#merge} */
public int shardIndex;
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score) {
this(doc, score, -1);
}
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score, int shardIndex) {
this.doc = doc;
this.score = score;
this.shardIndex = shardIndex;
}
// A convenience method for debugging.
@Override
public String toString() {
return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
}
}
How should I serialize it with Writable
interface? What is the connection between Writable
and java.io.serializable
interface?
Upvotes: 0
Views: 4296
Reputation: 6169
I think that it wont be a good idea to tamper with the in-built Lucene class. Instead, have your own class which can will contain the fields of ScoreDoc type and would implement Hadoop writable in interface. It would be something like this:
public class MyScoreDoc implements Writable {
private ScoreDoc sd;
public void write(DataOutput out) throws IOException {
String [] splits = sd.toString().split(" ");
// get the score value from the string
Float score = Float.parseFloat((splits[0].split("="))[1]);
// do the same for doc and shardIndex fields
// ....
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();
sd = new ScoreDoc (score, doc, shardIndex);
}
//String toString()
}
Upvotes: 1
Reputation: 4598
First see Hadoop: Easy way to have object as output value without Writable interface you can use Java serialization OR
See http://developer.yahoo.com/hadoop/tutorial/module5.html you need to make your own write and read function, its quite simple as inside can call the API to read and write int, flaot, string etc
Your example with Writable (need to import it)
public class ScoreDoc implements java.io.Serializable, Writable {
/** The score of this document for the query. */
public float score;//... as in above
public void write(DataOutput out) throws IOException {
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
score = in.readInt();
doc = in.readInt();
shardIndex = in.readInt();
}
//rest toStirng etc
}
Note : order of write and read should be the same or value of one will go to other, and if you have different types will get serialization errors on read
Upvotes: 0