Request clarification on some HDFS concepts

Question

I am not sure if this questions belongs here. If not, then I apologize. I am reading the HDFS paper and am finding it difficult to understand a few terminologies. Please find my questions below.

1) As per the paper, "The HDFS namespace is a hierarchy of files and directories. Files and directories are represented on the NameNode by inodes, which record attributes like permissions, modification and access times, namespace and disk space quotas." What exactly does namespace information mean in inode. Does it mean the complete path of the file? Because, the previous statement says "The HDFS namespace is a hierarchy of files and directories".

2) As per the paper "The NameNode maintains the namespace tree and the mapping of file blocks to DataNodes (the physical location of file data)." Are both namespace tree and namespace the same? Please refer to point 1 about definition of the namespace. How is the namespace tree information stored? Is it stored as part of inodes where each inode will also have a parent inode pointer?

3) As per the paper, "HDFS keeps the entire namespace in RAM. The inode data and the list of blocks belonging to each file comprise the metadata of the name system called the image." Does the image also contain the namespace?

4) What is the use of a namespace id? Is it used to distinguish between two different file system instances?

Thanks,

Venkat

Request clarification on some HDFS concepts

Answers (1)

Related Questions