Reputation: 1
I am reading the basics about Yarn and Hadoop FileSystem. I was told by some blogs online that Yarn is just resource management system and HDFS is about storage. But I encountered the following lines in the book Hadoop Definitive Guide:
In this line, I can infer that there should be some connection between the location of Datanodes and Node Manager Node. Maybe they can be in the same place. That contradicts the knowledge I got from the blog.
Can anyone helps to explain this?
I googled a lot by"connection between Datanode and Node Manager" and I can not find direct answer to that.
Upvotes: 0
Views: 294
Reputation: 5125
Yarn is the OS, the compute power. HDFS is the Disk.
If beneficial to move the compute to a node where the data is located. A node will often have a node manager that manages the compute(yarn) and a data node(HDFS). So both a container, and files for a yarn/hadoop job, can be colocated on 1 node/server. It's also the case you can just have a node manager on a node that isn't a data node. And you could have a data node, that wasn't a nodemanager. The two are independent, but frequently it makes sense to collocate them, to take advantage of data locality. After-all who wants a OS without a disk? (Their is actually a use case for this but lets not get into "compute nodes")
Upvotes: 2