student_WYC
student_WYC

Reputation: 1

What is the connection/relationship between Datanodes in HDFS and node manager on Yarn?

I am reading the basics about Yarn and Hadoop FileSystem. I was told by some blogs online that Yarn is just resource management system and HDFS is about storage. But I encountered the following lines in the book Hadoop Definitive Guide: enter image description here In this line, I can infer that there should be some connection between the location of Datanodes and Node Manager Node. Maybe they can be in the same place. That contradicts the knowledge I got from the blog. Can anyone helps to explain this?

I googled a lot by"connection between Datanode and Node Manager" and I can not find direct answer to that.

Upvotes: 0

Views: 294

Answers (1)

Matt Andruff
Matt Andruff

Reputation: 5125

Yarn is the OS, the compute power. HDFS is the Disk.

If beneficial to move the compute to a node where the data is located. A node will often have a node manager that manages the compute(yarn) and a data node(HDFS). So both a container, and files for a yarn/hadoop job, can be colocated on 1 node/server. It's also the case you can just have a node manager on a node that isn't a data node. And you could have a data node, that wasn't a nodemanager. The two are independent, but frequently it makes sense to collocate them, to take advantage of data locality. After-all who wants a OS without a disk? (Their is actually a use case for this but lets not get into "compute nodes")

Upvotes: 2

Related Questions