Bala
Bala

Reputation: 79

Differnce between Hadoop 1 and Hadoop 2

As of my knowledge, I know only one diffence between Hadoop 1 and 2.

Its active and passive Secondary Name Nodes.

Could some one list me the difference between Hadoop 1 and 2?

Upvotes: 3

Views: 14422

Answers (3)

Snigh
Snigh

Reputation: 461

Hadoop 1

  1. Hadoop 1.x Supports only MapReduce (MR) processing model.it Does not support non-MR tools.
  2. MR does both processing and cluster resource management.
  3. 1.x Has limited scaling of nodes. Limited to 4000 nodes per cluster.
  4. Works on concepts of slots – slots can run either a Map task or a Reduce task only.
  5. A single Namenode to manage the entire namespace.
  6. 1.x Has Single-Point-of-Failure (SPOF) – because of single Namenode- and in case of Namenode failure, needs manual intervention to overcome.
  7. MR API is compatible with Hadoop 1x. A program written in Hadoop1 executes in Hadoop1x without any additional files.
  8. 1.x Has a limitation to serve as a platform for event processing, streaming and real-time operations.
  9. Datanode size is 64 MB

Hadoop 2

  1. Hadoop 2.x Allows to work in MR as well as other distributed computing models like Spark, Hama, Giraph, Message Passing Interface) MPI & HBase coprocessors.
  2. YARN (Yet Another Resource Negotiator) does cluster resource management and processing is done using different processing models.
  3. 2.x Has better scalability. Scalable up to 10000 nodes per cluster.
  4. Works on concepts of containers. Using containers can run generic tasks.
  5. Multiple Namenode servers manage multiple namespace.
  6. 2.x Has feature to overcome SPOF with a standby Namenode and in case of Namenode failure, it is configured for automatic recovery.
  7. MR API requires additional files for a program written in Hadoop1x to execute in Hadoop2x.
  8. Can serve as a platform for a wide variety of data analytics-possible to run event processing, streaming and real time operations.
  9. Datanode size is 128 MB

Upvotes: 15

Priya Gupta
Priya Gupta

Reputation: 11

1) Scalability - Decreasing the load on the Resource Manager(RM) by delegating the work of handling the tasks running on slaves to application Master, RM can now handle more requests than Job tracker facilitating addition of more nodes.

2) Unlike MPv1 which is strongly coupled with the MapReduce , YARN supports many kinds of code running on them like MR2,Tez, Storm, Spark etc

3) Optimized resource allocation - There are no fixed number of slots separately allocated for Mapper and Reducers in YARN, which is the case in MRv1. So the available capacity of the nodes can be used to any task which needs resources.

4) When Resource manager fails , the jobs running on the cluster need not be restarted again after the recovery of Resource Manager.

5) Fail-over mechanism is implemented by ZK which is already part of Resource manager which says, we don't need to run another daemon.

Please Look for more details here..

Upvotes: 1

programmerbyheart
programmerbyheart

Reputation: 99

There is major improvement in Hadoop Architecture in Hadoop2. Introduced distributed operating system layer known as YARN (Yet Another Resource Negotiator). Now the resource (Memory and CPU) management is handled by YARN.

Also, HA introduced for NameNode.

Upvotes: 2

Related Questions