Suman
Suman

Reputation: 9571

Calculate running/cumulative cost of EC2 spot instance

I often run spot instances on EC2 (for Hadoop task jobs, temporary nodes, etc.) Some of these are long-running spot instances.

Its fairly easy to calculate the cost for on-demand or reserved EC2 instances - but how do I calculate the cost incurred for a specific node (or nodes) that are running as spot instances?

I am aware that the cost for a spot instance changes every hour depending on market rate - so is there any way to calculate the cumulative total cost for a running spot instance? Through an API or otherwise?

Upvotes: 7

Views: 5155

Answers (4)

amohr
amohr

Reputation: 475

I've re-written Suman's solution to work with boto3. Make sure to use utctime with the tz set!:

def get_spot_instance_pricing(ec2, instance_type, start_time, end_time, zone):
    result = ec2.describe_spot_price_history(InstanceTypes=[instance_type], StartTime=start_time, EndTime=end_time, AvailabilityZone=zone)
    assert 'NextToken' not in result or result['NextToken'] == ''

    total_cost = 0.0

    total_seconds = (end_time - start_time).total_seconds()
    total_hours = total_seconds / (60*60)
    computed_seconds = 0

    last_time = end_time
    for price in result["SpotPriceHistory"]:
        price["SpotPrice"] = float(price["SpotPrice"])

        available_seconds = (last_time - price["Timestamp"]).total_seconds()
        remaining_seconds = total_seconds - computed_seconds
        used_seconds = min(available_seconds, remaining_seconds)

        total_cost += (price["SpotPrice"] / (60 * 60)) * used_seconds
        computed_seconds += used_seconds

        last_time = price["Timestamp"]

    # Difference b/w first and last returned times
    avg_hourly_cost = total_cost / total_hours
    return avg_hourly_cost, total_cost, total_hours

Upvotes: 4

Memos
Memos

Reputation: 464

I have recently developed a small python library that calculates the cost of a single EMR cluster, or for a list of clusters (given a period of days).

It takes into account Spot instances and Task nodes as well (that may go up and down while the cluster is still running).

In order to calculate the cost I use the bid price, which (in many cases) might not be the exact price that you end up paying for the instance. Depending on your bidding policy however, this price can be accurate enough.

You can find the code here: https://github.com/memosstilvi/emr-cost-calculator

Upvotes: 1

Aloisius
Aloisius

Reputation: 79

You can subscribe to the spot instance data feed to get charges for your running instances dumped to an S3 bucket. Install the ec2 toolset and then run:

ec2-create-spot-datafeed-subscription -b bucket-to-dump-in

Note: you can have only one data feed subscription for your entire account.

In about an hour you should start seeing gzipped tabbed delimited files show up in the bucket that look something like this:

#Version: 1.0
#Fields: Timestamp UsageType Operation InstanceID MyBidID MyMaxPrice MarketPrice Charge Version
2013-05-20 14:21:07 UTC SpotUsage:m1.xlarge RunInstances:S0012  i-1870f27d  sir-b398b235    0.219 USD   0.052 USD   0.052 USD   1

Upvotes: 3

Suman
Suman

Reputation: 9571

OK I found a way to do this in the Boto library. This code is not perfect - Boto doesn't seem to return the exact time range, but it does get the historic spot prices more or less within a range. The following code seems to work quite well. If anyone can improve on it, that would be great.

import boto, datetime, time

# Enter your AWS credentials
aws_key = "YOUR_AWS_KEY"
aws_secret = "YOUR_AWS_SECRET"

# Details of instance & time range you want to find spot prices for
instanceType = 'm1.xlarge'
startTime = '2012-07-01T21:14:45.000Z'
endTime = '2012-07-30T23:14:45.000Z'
aZ = 'us-east-1c'

# Some other variables
maxCost = 0.0
minTime = float("inf")
maxTime = 0.0
totalPrice = 0.0
oldTimee = 0.0

# Connect to EC2
conn = boto.connect_ec2(aws_key, aws_secret)

# Get prices for instance, AZ and time range
prices = conn.get_spot_price_history(instance_type=instanceType, 
  start_time=startTime, end_time=endTime, availability_zone=aZ)

# Output the prices
print "Historic prices"
for price in prices:
  timee = time.mktime(datetime.datetime.strptime(price.timestamp, 
    "%Y-%m-%dT%H:%M:%S.000Z" ).timetuple())
  print "\t" + price.timestamp + " => " + str(price.price)
  # Get max and min time from results
  if timee < minTime:
    minTime = timee
  if timee > maxTime:
    maxTime = timee
  # Get the max cost
  if price.price > maxCost:
    maxCost = price.price
  # Calculate total price
  if not (oldTimee == 0):
    totalPrice += (price.price * abs(timee - oldTimee)) / 3600
  oldTimee = timee

# Difference b/w first and last returned times
timeDiff = maxTime - minTime

# Output aggregate, average and max results
print "For: one %s in %s" % (instanceType, aZ)
print "From: %s to %s" % (startTime, endTime)
print "\tTotal cost = $" + str(totalPrice)
print "\tMax hourly cost = $" + str(maxCost)
print "\tAvg hourly cost = $" + str(totalPrice * 3600/ timeDiff)

Upvotes: 6

Related Questions