Reputation: 3938
Now that EMR supports downsizing of Core nodes on EMR, if I create an EMR cluster with 1 of the core nodes as a spot instance. What happens when the spot price exceeds the bid price for my core node? Will it gracefully decomission that core node?
Here is Amazon's description of the process of shrinking the number of core nodes:
On core nodes, both YARN NodeManager and HDFS DataNode daemons must be decommissioned in order for the instance group to shrink. For YARN, graceful shrink ensures that a node marked for decommissioning is only transitioned to the DECOMMISIONED state if there are no pending or incomplete containers or applications. The decommissioning finishes immediately if there are no running containers on the node at the beginning of decommissioning.
For HDFS, graceful shrink ensures that the target capacity of HDFS is large enough to fit all existing blocks. If the target capacity is not large enough, only a partial amount of core instances are decommissioned such that the remaining nodes can handle the current data residing in HDFS. You should ensure additional HDFS capacity to allow further decommissioning. You should also try to minimize write I/O before attempting to shrink instance groups as that may delay the completion of the resize operation.
Another limit is the default replication factor, dfs.replication inside /etc/hadoop/conf/hdfs-site. Amazon EMR configures the value based on the number of instances in the cluster: 1 with 1-3 instances, 2 for clusters with 4-9 instances, and 3 for clusters with 10+ instances. Graceful shrink does not allow you to shrink core nodes below the HDFS replication factor; this is to prevent HDFS from being unable to close files due insufficient replicas. To circumvent this limit, you must lower the replication factor and restart the NameNode daemon.
Upvotes: 0
Views: 306
Reputation: 1386
I think it might not be possible to gracefully decommission the node in case of Spot price spike (general case with N core nodes). There is a 2 minute notification possible before the Spot Instance is removed due to price spike. Even if captured, this time period might not be sufficient to guarantee decommission of HDFS data.
Also, with only 1 core node in cluster, decommissioning does not make much sense. The data held in the cluster needs to be moved to other nodes, which are not available in this case. Once the only available core node is lost, there needs to be a way to bring one back, else the cluster cannot run any tasks.
Shameless plug :) : I work for Qubole!
The following 2 blog posts might be useful around integration of Spot instances with Hadoop clusters including dealing with Spot price spikes.
https://www.qubole.com/blog/product/riding-the-spotted-elephant https://www.qubole.com/blog/product/rebalancing-hadoop-higher-spot-utilization
Upvotes: 1