Reputation: 4232
I am using EMR with task instance groups as spot instances. I want to maintain minimum number of task instances always. Means, whenever EMR terminates task instances because of bid price goes higher than what we set, my application should launch another task instance with little higher bid price.
My research-
Questions
Upvotes: 0
Views: 1563
Reputation: 497
As has been correctly pointed, the EMR API provides all necessary ingredients to 1) collect monitoring data, and 2) programmatically scale the cluster up and down.
Basically, there are two main options to implement autoscaling for EMR clusters:
Both options have their pros and cons. The main advantage of option 2 is that it is a server-less approach (does not require to run your own server). Option 1, on the other hand, does require a server, but therefore comes with more control to customize the logic of your scaling rules. Also, it allows to keep searchable records of the history of the scaling decisions.
You could take a look at Themis, an EMR autoscaling framework developed at Atlassian. Themis implements the autoscaling loop as discussed in option 1 above. Current features include proactive as well as reactive autoscaling, support for spot/on-demand task nodes, it comes with a Web UI, and the tool is very easy to configure.
Upvotes: 1
Reputation: 71
I have had a similar problem, and I wanted to share one possible alternative. I have written a Java tool to dynamically resize an EMR cluster during the processing. It might help you. Check it out at:
http://www.lopakalogic.com/articles/hadoop-articles/dynamically-resize-emr/
The source code is available on Github
Upvotes: 0
Reputation: 270224
How Spot Prices Work
When an Amazon EC2 instance is launched with a spot price (including when launched from Amazon EMR), the instance will start if the current spot price is below the provided bid price. If the spot price rises above the bid price, the instance is terminated. Instances are only charged the current spot price.
Therefore, the logic of launching a new spot instance with a "little higher bid price" is not necessary. The instance will always be charged the current spot price, so simply bid as high as you are willing to pay for a spot instance. You will either pay less than the spot price (great!) or your instance will be terminated because the price has gone higher than you are willing to pay (in which case you don't want to pay a "little higher" for the instance).
If you wish to "maintain minimum number of task instances" at all times, then either pay the normal EMR charge (which means the instances won't be terminated) or bid a particularly large price for the spot instances, such as 2 x the normal price. Yes, you might occasionally pay more for instances, but on average your price will be quite low.
If you wish to be particularly sneaky, you could bid up to the normal price for the EC2 instances then, if instances are terminated, launch more task nodes without using spot pricing. That way, your instances won't be terminated and you won't pay more than the normal EC2 price. However, you would have to terminate and replace those instances when the spot price drops, otherwise you are paying too much. That's why it might be better just to provide a high bid price on your spot instances.
Bottom line: Use spot pricing, but bid a high price. You'll get a good price most of the time.
Upvotes: 3
Reputation: 14543
AWS EMR does not have a autoscaling option available. But you can use a work around and integrate Autoscaling using AWS SQS. This is a rough picture what you can integrate.
This is guide to AWS SQS Autoscaling.
https://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html
Upvotes: 1