Reputation: 27
Is there a way to serialize java collections in Hadoop?
The Writable
interface is for Java primitives only. I have following class attributes.
private String keywords;
private List<Status> tweets;
private long queryTime = 0;
public TweetStatus(String keys, List<Status> tweets, long queryTime){
this.keywords = keys;
this.tweets = tweets;
this.queryTime = queryTime;
}
How I can serialize List
object?
Upvotes: 2
Views: 2443
Reputation: 241621
The
Writable
interface is for Java primitives only.
Right. Basically you need to break down your object into a sequence of objects that you can serialize.
So, from first principles, to serialize a list you need to serialize the size of the list and then serialize each element of the list. This way, when you need to deserialize, you know how many elements you need to deserialize.
Something like this should get you on the write (pun!) track:
class TweetStatusWritable implements Writable {
private String keywords;
private List<Status> tweets;
private long queryTime;
// add getters for the above three fields
public void readFields(DataInput in) {
this.keywords = in.readUTF();
int size = in.readInt();
this.tweets = new List<Status>();
for(int i = 0; i < size; i++) {
Status status = // deserialize an instance of Status
tweets.add(status);
}
this.queryTime = in.readLong();
}
public void write(DataOutput out) {
out.writeUTF(this.keywords);
out.writeInt(this.tweets.size());
for(int i = 0; i < this.tweets.size(); i++) {
// serialize tweets[i] onto out
}
out.writeLong(queryTime);
}
public TweetStatusWritable(
String keywords,
List<Status> tweets,
long queryTime
) {
this.keywords = keywords;
this.tweets = tweets;
this.queryTime = queryTime;
}
}
Upvotes: 3
Reputation: 34184
If you have a lot of serialization stuff ahead, you might find Avro useful.
Upvotes: 0
Reputation: 16392
Take a look at ArrayWritable. It lets you serialize an array of instances (all of the same type). You could build one of those from your List
Upvotes: 0