Reputation: 3
Can any one help me understand the data storage concept of hadoop?
As I understand it, hadoop deals with fs image and data blocks, and fsimage and edit logs paths are stored hdfs-site.xml. But what about the data blocks? Can anyone help me in this? I am little bit confused where the /user and /tmp dir is actually present in the filesystem.
I used this link to set up a single node hadoop cluster: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Upvotes: 0
Views: 312
Reputation: 1411
The Namenode's FSImage keeps track of which Datanode has which files. In the hdfs-site.xml file, the configuration 'dfs.data.dir' defines where the datanode stores the underlying files on the filesystem. This can be a comma separated list of directories (think multiple disks).
Upvotes: 0
Reputation: 1274
Files are split into blocks and stored in the Hadoop Distributed File System (HDFS). Consult the HDFS module of Yahoo's Hadoop Tutorial for a description of HDFS. The directories stored in HDFS can be viewed by typing the following command into a terminal: hadoop dfs -ls
Upvotes: 3