How is data written to HDFS?

Question

I'm trying to understand how is data writing managed in HDFS by reading hadoop-2.4.1 documentation.

According to the following schema :

HDFS architecture

whenever a client writes something to HDFS, he has no contact with the namenode and is in charge of chunking and replication. I assume that in this case, the client is a machine running an HDFS shell (or equivalent).

However, I don't understand how this is managed. Indeed, according to the same documentation :

The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

Is the schema presented above correct ? If so,

is the namenode only informed of new files when it receives a Blockreport (which can take time, I suppose) ?
why does the client write to multiple nodes ?

If this schema is not correct, how is file creation working with HDFs ?

How is data written to HDFS?

Answers (1)

Related Questions