baiduXiu
baiduXiu

Reputation: 167

Getting number of cores for EMR cluster

I am using 3 instances of r4.2x large for my slave node in my emr cluster which has 8 cpu .How do i determine the number of cores available in the cluster .I used the following command to determine this :

grep cores /proc/cpuinfo

It says i have 1 core per cpu .

For spark ETL job is it better to use R series of AWS instance or C series .Also is the above command the right way to determine the cores available for the cluster

Upvotes: 0

Views: 3134

Answers (1)

Dunedan
Dunedan

Reputation: 8435

The number of cores in your EMR cluster is simply the sum of the cores per core/task instance multiplied by the cores of the instance type you're using. So in your case it'd be:

3 instances * 8 cores (per r4.2xlarge) = 24 cores

I assume you're confused by the content of /proc/cpuinfo. If you look at it without grepping for cores you'll see multiple processors mentioned. Maybe check out: How to obtain the number of CPUs/cores in Linux from the command line?

Keep in mind that this sum of CPUs is not necessarily the number of CPUs working on your tasks, as that also depends on the configuration of Hadoop/Spark.

Regarding the instance types: Which type to choose depends on your workload. If it's a memory heavy workload (like Spark jobs usually are) EC2-instances from the memory-heavy R-families are probably a better choice than instances from the CPU-heavy C-families of instances.

Upvotes: 1

Related Questions