Venk K
Venk K

Reputation: 1177

Request clarification on some HDFS concepts

I am not sure if this questions belongs here. If not, then I apologize. I am reading the HDFS paper and am finding it difficult to understand a few terminologies. Please find my questions below.

1) As per the paper, "The HDFS namespace is a hierarchy of files and directories. Files and directories are represented on the NameNode by inodes, which record attributes like permissions, modification and access times, namespace and disk space quotas." What exactly does namespace information mean in inode. Does it mean the complete path of the file? Because, the previous statement says "The HDFS namespace is a hierarchy of files and directories".

2) As per the paper "The NameNode maintains the namespace tree and the mapping of file blocks to DataNodes (the physical location of file data)." Are both namespace tree and namespace the same? Please refer to point 1 about definition of the namespace. How is the namespace tree information stored? Is it stored as part of inodes where each inode will also have a parent inode pointer?

3) As per the paper, "HDFS keeps the entire namespace in RAM. The inode data and the list of blocks belonging to each file comprise the metadata of the name system called the image." Does the image also contain the namespace?

4) What is the use of a namespace id? Is it used to distinguish between two different file system instances?

Thanks,

Venkat

Upvotes: 0

Views: 380

Answers (1)

Brugere
Brugere

Reputation: 436

What exactly does namespace information mean in inode. Does it mean the complete path of the file? Because, the previous statement says "The HDFS namespace is a hierarchy of files and directories

It means that you can browse your files like you do on your system ( via commands like hadoop dfs -ls) you will see results like : /user/hadoop/myFile.txt but physically this file is distributed on your cluster in several blocks according to your replication factor

Are both namespace tree and namespace the same? Please refer to point 1 about definition of the namespace. How is the namespace tree information stored? Is it stored as part of inodes where each inode will also have a parent inode pointer?

When you copy a file on your HDFS with commands like hadoop dfs -copyFrom local myfile.txt /user/hadoop/myfile.txt, the file is splitted according to the dfs.block.size value (default is 64MB). Then blocks are distributed on your datanodes (nodes used for storage). The namenode keep a map of all blocks in order to verify your data integrity when it starts (or with commands like hadoop fsck /).

Does the image also contain the namespace?

For this one I am not sure but I think the namespace is in RAM too.

What is the use of a namespace id? Is it used to distinguish between two different file system instances?

Yes, the namespace id is just an ID, it ensures the datanode data coherence.

I hope that helps you even it is far away from an exhaustive explanation.

Upvotes: 2

Related Questions