Coleman S
Coleman S

Reputation: 498

Scale down specific Heroku worker dynos?

I'm building a web application which provides as a core feature the ability for users to upload large images and have them processed. The processing takes roughly 3 minutes to complete, and I thought Heroku would be an ideal platform for being able to run these processing jobs on-demand, and in a highly scalable way. The processing task itself is fairly computationally expensive, and needs to run a the high-end PX dyno. I want to maximize parallelization, and minimize (effectively eliminate) the time a job spends waiting in a queue. In other words, I want to have N PX dynos for N jobs.

Thankfully, I can accomplish this pretty easily with Heroku's API (or optionally a service like Hirefire). Whenever a new processing request comes in, I can simply increment the worker count and the new worker will grab the job from the queue and start processing immediately.

However, while scaling up is painless, scaling down is where the trouble starts. The Heroku API is frustratingly limited. I can only set the number of running workers, not specifically kill idle ones. This means that if I have 20 workers each processing an image, and one completes its task, I cannot safely scale the worker count to 19, because Heroku will kill an arbitrary worker dyno, regardless of whether it's actually in the midst of a job! Leaving all workers running until all jobs complete is simply out of the question, because the cost would be astronomical. Imagine 100 workers created during a spike continue to idle indefinitely as a few new jobs trickle in throughout the day!

I've scoured the web, and the best "solution" that people suggest is to have your worker process gracefully handle termination. Well that's perfectly fine if your worker is just doing mass-emailing, but my workers are doing some very drawn-out analytics on images, and as I mentioned above, take about 3 minutes to complete.

In an ideal world, I could kill a specific worker dyno upon completion of its task. This would make scaling down just as easy as scaling up.

In fact, I've come close to that ideal world by switching from worker dynos to one-off dynos (which terminate upon process termination, i.e. you stop paying for the dyno after it's "root program" exits). However, Heroku sets a hard limit of 5 one-off dynos that can be run simultaneously. This I can understand, as I was certainly in a sense abusing one-off dynos...but it is quite frustrating nonetheless.

Is there any way I can better scale down my workers? I would prefer not to have to radically re-engineer my processing algorithm...splitting it up into a few chunks which run in 30-40 seconds as opposed to one 3 minute stretch (that way accidentally killing a running worker wouldn't be catastrophic). That approach would drastically complicate my processing code and introduce several new points of failure. However, if it's my only option, I'll have to do it.

Any ideas or thoughts are appreciated!

Upvotes: 12

Views: 1959

Answers (4)

fearless_fool
fearless_fool

Reputation: 35239

Schedule a cleanup task

Summary: Queue a task to run at the lowest priority. Once all other tasks have completed, the cleanup task will run.

Details

[NOTE: once I wrote this answer, I realize that it doesn't address the need to spin down a specific worker dyno. But you should be able exploit the key technique shown here: queue a low(er) priority DJ task to clean up when everything else has been processed.]

I've had good luck using Heroku's [platform-api][1] gem to spin up Delayed Job workers on demand and spin them down when they finish. To simplify things, I created a heroku_control.rb file as follows.

My app only needed one worker; I recognize that your requirements are significantly more involved, but any app can exploit this one trick: queue a low-priority task to shut down the worker dyno(s) after all other delayed job tasks have been processed.

require 'platform-api'

# Simple class to interact with Heroku's platform API, allowing
# you to start and stop worker dynos under program control.
class HerokuControl

  API_TOKEN = "<redacted>"
  APP_NAME = "<redacted>"

  def self.heroku
    @heroku ||= PlatformAPI.connect_oauth(API_TOKEN)
  end

  # Spin up one worker dyno
  def self.worker_up(act = Rails.env.production?)
    self.worker_set_quantity(1) if act
  end

  # Spin down all worker dynos
  def self.worker_down(act = Rails.env.production?)
    self.worker_set_quantity(0) if act
  end

  def self.worker_set_quantity(quantity)
    heroku.formation.update(APP_NAME, 'worker', {"quantity" => quantity.to_s})
  end

end

And in my app, I do something like this:

LOWEST_PRIORITY = 100

def start_long_process
  queue_lengthy_process
  queue_cleanup_task        # clean up when everything else is processed
  HerokuControl::worker_up  # assure there is a worker dyno running
end

def queue_lengthy_process
  # do long job here...
end
handle_asynchronously :queue_lengthy_process, :priority => 1

# This gets processed when Delayed::Job has nothing else
# left in its queue.
def queue_cleanup_task
  HerokuControl::worker_down # shut down all worker dynos
end
handle_asynchronously :queue_cleanup_task, :priority => LOWEST_PRIORITY

Hope this helps.

Upvotes: 2

Lawrence
Lawrence

Reputation: 10770

It is now possible to shut down a specific dyno using the heroku ps:stop command.

e.g. if your heroku ps output contains:

web.1: up 2017/09/01 13:03:50 -0700 (~ 11m ago)
web.2: up 2017/09/01 13:03:48 -0700 (~ 11m ago)
web.3: up 2017/09/01 13:04:15 -0700 (~ 11m ago)

you can run heroku ps:stop web.2 to kill the second dyno in the list.

This won't do exactly what you want, because Heroku will immediately start up a new dyno to replace the one that was shut down. But perhaps that is still useful to you (or other people reading this question).

Upvotes: 0

Andr&#233;
Andr&#233;

Reputation: 2142

This is what Heroku's support answered about this:

I'm afraid this isn't possible at the moment. When scaling down your workers, we will stop the one with the highest number, so we don't have to change the public name for those dynos, and you don't get numbering holes.

I found this comment interesting in this context, although it did not really solve this issue.

Upvotes: 3

Valevalorin
Valevalorin

Reputation: 420

I know you mentioned graceful termination, but I assume you meant graceful termination as in when a worker is killed off by using the API to set the number of workers. Why not just add as a part of the worker logic to kill itself when its job has been completed?

Upvotes: 0

Related Questions