Vijayanand
Vijayanand

Reputation: 500

AWS EMR with only master & Task nodes

Is that possible to build a AWS EMR with a master node and set of task(slave) nodes (with out core nodes),when I am sure that source data is in S3 and processed result is going to be stored in S3.

Basically, the question is "what is the need of having Datanode process when EMR is going to process the data in S3 " ( where we do not store and use anything in HDFS).

Upvotes: 7

Views: 1421

Answers (1)

ChristopherB
ChristopherB

Reputation: 2068

Core nodes in EMR provide compute resources as well as HDFS. In Hadoop 2.x this would be provided by YARN NodeManager. Even if an application's input and output are both on S3, YARN (and often other app layers like Hive) utilizes HDFS to stage jars, split info, session data, etc.

Upvotes: 2

Related Questions