rekha gupta
rekha gupta

Reputation: 27

How to serialize List collection object in Hadoop?

Is there a way to serialize java collections in Hadoop?

The Writable interface is for Java primitives only. I have following class attributes.

private String keywords;
private List<Status> tweets;
private long queryTime = 0;

public TweetStatus(String keys, List<Status> tweets, long queryTime){
    this.keywords = keys;
    this.tweets = tweets;
    this.queryTime = queryTime;
}

How I can serialize List object?

Upvotes: 2

Views: 2443

Answers (3)

jason
jason

Reputation: 241621

The Writable interface is for Java primitives only.

Right. Basically you need to break down your object into a sequence of objects that you can serialize.

So, from first principles, to serialize a list you need to serialize the size of the list and then serialize each element of the list. This way, when you need to deserialize, you know how many elements you need to deserialize.

Something like this should get you on the write (pun!) track:

class TweetStatusWritable implements Writable {
    private String keywords;
    private List<Status> tweets;
    private long queryTime;

    // add getters for the above three fields

    public void readFields(DataInput in) {
        this.keywords = in.readUTF();
        int size = in.readInt();
        this.tweets = new List<Status>();
        for(int i = 0; i < size; i++) {
            Status status = // deserialize an instance of Status
            tweets.add(status);
        }
        this.queryTime = in.readLong();
    }

    public void write(DataOutput out) {
        out.writeUTF(this.keywords);
        out.writeInt(this.tweets.size());
        for(int i = 0; i < this.tweets.size(); i++) {
             // serialize tweets[i] onto out
        }       
        out.writeLong(queryTime);
    }

    public TweetStatusWritable(
        String keywords,
        List<Status> tweets,
        long queryTime
    ) {
        this.keywords = keywords;
        this.tweets = tweets;
        this.queryTime = queryTime;
    }
}

Upvotes: 3

Tariq
Tariq

Reputation: 34184

If you have a lot of serialization stuff ahead, you might find Avro useful.

Upvotes: 0

Chris Gerken
Chris Gerken

Reputation: 16392

Take a look at ArrayWritable. It lets you serialize an array of instances (all of the same type). You could build one of those from your List

Upvotes: 0

Related Questions