Matias Fernandez
Matias Fernandez

Reputation: 128

Rails Initializer: An infinite loop in a separate thread to update records in the background

I want to run an infinite loop on a separate thread that starts as soon as the app initializes (in an initializer). Here's what it might look like:

# in config/initializers/item_loop.rb

Thread.new
  loop do
    Item.find_each do |item|
      # Get price from third-party api and update record.
      item.update_price!
      # Need to wait a little between requests to avoid getting throttled.
      sleep 5
    end
  end
end

I tend to accomplish this by running batch updates in recurring background jobs. But this doesn't make sense since I don't really need parallelization, downtime, or queueing, I just want to update one item at a time in a single thread, forever.

Yet there are multiple things that concern me:

  1. Leaked Connections: Should I open up a new connection_pool for the thread? Should I use a gem like safely to avoid crashing the thread?
  2. Thread Safety: Should I be worried about race conditions? Should I make use of Mutex and synchronize? Does using ActiveRecord::Base.transaction impact thread safety?
  3. Deadlock: Should I use Rails.application.executor.wrap?
  4. Concurrent Ruby/Sleep Intervals: Should I use TimerTask from concurrent-ruby gem instead of sleep or something other than Thread.new?

Information on any of these subjects is appreciated.

Upvotes: 2

Views: 1829

Answers (2)

Fumisky Wells
Fumisky Wells

Reputation: 1199

I use rails runner x (god gem or k8s) in our similar case.

Rails runner runs in another process so that we do not have to worry about connection-leak and thread-safety.

God-gem or k8s supports concurrency and monitoring the job failure. Running 1 process with some specific sleep-time would promise third-party API throttles (running N process with N API-key could support speed up).

I think deadlock would happen in any concurrency situation.

I do not think this loop + sleep approach is a design flaw, because:

  • cron always starts based on schedule so that long running jobs could run simultaneously. We need to add a logic to avoid job overlapping. Rather, just loop + sleep keeps maximum throughput without any job overlap.
  • ActiveJob is good for one-shot long-running task, but it does not fit for daemon.

Upvotes: 0

Semjon
Semjon

Reputation: 1023

Usually to perform a job in a background process(non web-server process) a background workers manager is used. Rails has a specific interface for that manager called ActiveJob There are few implementation of a background workers manager - Sidekiq, DelayedJob, Resque, etc. Sidekiq is preferred. Returning back to actual problem - you may create a schedule to run UpdatePriceJob every interval using gem sidekiq-scheduler Another nice extension for throttling Sidekiq workers is sidekiq-throttler

Some code snippets:

# app/workers/update_price_worker.rb
# Actual Worker class
class UpdatePriceWorker
  include Sidekiq::Worker

  sidekiq_options throttle: { threshold: 720, period: 1.hour }

  def perform(item_id)
    Item.find(item_id).update_price!
  end
end

# app/workers/update_price_master_worker.rb
# Master worker that loops over items
class UpdatePriceMasterWorker
  include Sidekiq::Worker

  def perform
    Item.find_each { |item| UpdatePriceWorker.perform_async item.id }
  end
end

# config/sidekiq.yml
:schedule:
  update_price:
   cron: '0 */4 * * *'   # Runs once per 4 hours - depends on how many Items are there
   class: UpdatePriceMasterWorker

Idea of this setup - we run MasterWorker every 4 hours(this depends on how much time it takes to update all items). Master worker creates jobs to update price of an every particular item. UpdatePriceWorker is throttled to max 720 RPH.

Upvotes: 1

Related Questions