JonyLinux
JonyLinux

Reputation: 41

Wrapper around Java primitive types

I'm learning hadoop and know just basic concepts of Java. While studying hadoop, I've found that hadoop uses its own types like Longwritable, Text etch which are extended or wrapped version of Java's primitive types.

I'm posting this question in java community because I think these are the only people who can clear my doubts.

My intention it to understand globally this concept not just because of its related to hadoop but its sounds very interesting to me and can be used anywhere not in hadoop only.

While reading I've found that hadoop did this so that they could move data on the network very fast. And It can be done through serialization and de-serialization. For this concept Dataoutput can be used, it reads data from any of the Java primitive types and convert to a series of byte and after that by using Datainput it reads those bytes again and convert back to its original state.

My First question here is, Why always data needs to be converted into bytes for serialization/de-serialization ? I heard somewhere that bytes are lower in weight than actual data, so thats a only reason ? Is there any other reason ?

Second Question, When we do serialize and de-serialize lets say using following code

public class LongWritable implements Writable {
       // Some data     
       private int counter;
       private long timestamp;

       public void write(DataOutput out) throws IOException {
         out.writeInt(counter);
         out.writeLong(timestamp);
       }

       public void readFields(DataInput in) throws IOException {
         counter = in.readInt();
         timestamp = in.readLong();
       }

       public static LongWritable read(DataInput in) throws IOException {
         LongWritable w = new LongWritable();
         w.readFields(in);
         return w;
       }
     }

So here we are using DataInput and DataOutput type which is referring to the object of class that are implementing these interfaces. So my second question is, Are these reference types are byte streams itself from where they read or write the bytes ? I'm confused here, How bytes stream is getting generated here to read and write operations on the network like in hadoop ?

Last Question, How the same code communicates with the data on machine where serialization is being done and with another machine on network where de-serialization is being done once data reached there ? How this linkage happen in serialize/ deserialize with same code over the network ?

Upvotes: 1

Views: 93

Answers (1)

Serr
Serr

Reputation: 333

Why always data needs to be converted into bytes for serialization/de-serialization?

The objective of serialization is to send data to somewhere outside your software (your hard drive or other software somewhere). Those processes will need a universal low-level data representation such as bytes to be transported.

_

Are these reference types are byte streams itself from where they read or write the bytes ? I'm confused here, How bytes stream is getting generated here to read and write operations on the network like in hadoop ?

They are not byte streams. They are Java classes like any other, but they hold internally the byte stream. You could check the code of one implementation of those interfaces to see how they work better, like DataInputStream, you will be able to see that they hold inside the byte array. The actual reading and writing in bytes are very low level stuff, I actually not sure how exactly it is done, but it is possible to figure out if you keep going deeper into those implementations.

_

How the same code communicates with the data on machine where serialization is being done and with another machine on network where de-serialization is being done once data reached there?

To be able to deserialize an object the destination needs to have also the same Java object that was used to serialize. To ensure that both classes are equal on both source and destination, so you don't get unexpected results when deserializing it, it is recommended that you generate a serialVersionUID like:

private static final long serialVersionUID = 3770035753852147836L;

Upvotes: 0

Related Questions