Reputation: 31

What does "FsDataInputStream in turn wraps a DFSInputStream" mean in anatomy of file read in Hadoop

I am pretty new here and this is my first question. Apologies if I have done something wrong.

I have been reading Hadoop The definitive guide by Tom White. In chapter 3 The Hadoop distributed FileSystem, in anatomy of file read I am unable to understand what does it mean that "FsDataInputStream in turn wraps a DFSInputStream which manages the datanode and namenode I/O." Please check this for reference https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/data-flow

I am really confused. A simple explanation would be greatly appreciated.

Thanks

Upvotes: 2

Answers (1)

Remus Rusanu

Reputation: 294437

In Java a DataInputStream is a specialization of InputStream:

A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.

In Hadoop the same relation applies to FSDataInputStream and FSInputStream.

A stream that 'wraps' another stream it means that any operation done on a wrapper stream is in turn transformed into operations into the wrapped streams. So an DataInputStream can offer data semantics (eg. read Java data type primitives) by reading the wrapped InputStream bytes and interpreting them according to the serialization rules for Java primitives. You could read the Inputstream yourself, but then you would have to decode the raw bytes into Java types and know the serialization rules. Notice how the DataInputStream works on any InputStream, is only concerned with the rules for serialization, not with the actual stream origin. The wrapped stream can be a file, a memory region, a network connection.

Upvotes: 1

What does &quot;FsDataInputStream in turn wraps a DFSInputStream&quot; mean in anatomy of file read in Hadoop

Answers (1)

Related Questions

What does "FsDataInputStream in turn wraps a DFSInputStream" mean in anatomy of file read in Hadoop