Nitin
Nitin

Reputation: 2874

Hadoop terminology mapping to hardware

I am starting out in Hadoop and trying to implement a Hadoop Cluster. I am new to distributed systems so am a bit confused with the terminology.

Upvotes: 0

Views: 173

Answers (1)

Chris White
Chris White

Reputation: 30089

Firstly (on a terminology front), i assume you mean instantiate a Hadoop cluster rather than implement one.

  • A namenode manages one or more datanodes. The index of file names to block IDs is maintained by the namenode in memory and periodically flushed to disk. The actual locations of the blocks are reported by the datanodes to the name node, from which point it manages the assignment, migration, replication and removal of blocks.
  • A datanode manages the storage of blocks on physical hard disks. A datanode can distribute it's blocks over one or more physical disks (in fact you're encouraged to use multiple physical disks rather than a single logical volume of disks)
  • The Job Tracker (JT) manages the process of task assignment (either map or reduce) to a one or more Task Trackers (TT). Typically you will configure each node (physical machine) in your cluster such that the maximum number tasks that can be run (map / reduce) matches the number of cores (not a hard and fast rule, depends on how you expect to use the cluster)
  • Node typically implies a physical machine, which typically runs a Task Tracker (which runs map / reduce tasks) and a Data Node (storing / serving up file blocks).

Upvotes: 1

Related Questions