Reputation: 324
I have a status dashboard that shows the status of remote hardware devices that 'ping' the application every minute and log their status.
class Sensor < ActiveRecord::Base
has_many :logs
def most_recent_log
logs.order("id DESC").first
end
end
class Log < ActiveRecord::Base
belongs_to :sensor
end
Given I'm only interested in showing the current status, the dashboard only shows the most recent log for all sensors. This application has been running for a long time now and there are tens of millions of Log
records.
The problem I have is that the dashboard takes around 8 seconds to load. From what I can tell, this is largely because there is an N+1 Query fetching these logs.
Completed 200 OK in 4729.5ms (Views: 4246.3ms | ActiveRecord: 480.5ms)
I do have the following index in place:
add_index "logs", ["sensor_id", "id"], :name => "index_logs_on_sensor_id_and_id", :order => {"id"=>:desc}
My controller / lookup code is the following:
class SensorsController < ApplicationController
def index
@sensors = Sensor.all
end
end
I had thought of putting a latest_log_id
reference on to Sensor
and then updating this every time a new log for that sensor is posted - but something in my head is telling me that other developers would say this is a bad thing. Is this the case?
How are problems like this usually solved?
Upvotes: 1
Views: 296
Reputation: 35533
There are 2 relatively easy ways to do this:
Basic ActiveRecord approach:
subquery = Log.group(:sensor_id).select("MAX('id')")
@sensors = Sensor.eager_load(:logs).where(logs: {id: subquery}).all
Note that you should NOT use your most_recent_log
method for each sensor (that will trigger an N+1), but rather logs.first
. Only the latest log for each sensor will actually be prefetched in the logs
collection.
Rolling your own may be more efficient from a SQL perspective, but more complex to read and use:
@sensors = Sensor.all
logs = Log.where(id: Log.group(:sensor_id).select("MAX('id')"))
@sensor_logs = logs.each_with_object({}){|log, hash|
hash[log.sensor_id] = log
}
@sensor_logs
is a Hash, permitting a fast lookup for the latest log by sensor.id
.
Regarding your comment about storing the latest log id - you are essentially asking if you should build a cache. The answer would be 'it depends'. There are many advantages and many disadvantages to caching, so it comes down to 'is the benefit worth the cost'. From what you are describing, it doesn't appear that you are familiar with the difficulties they introduce (Google 'cache invalidation') or if they are applicable in your case. I'd recommend against it until you can demonstrate that a) it is adding real value over a non-cache solution, and b) it can be safely applied for your scenario.
Upvotes: 2
Reputation: 13952
There's 3 options:
--
is explained by PinnyM
You can do a join from the Sensor just to the latest Log record for each row, so everything gets fetched in the one query. Not sure off hand how that'll perform with the number of rows you have, likely it'll still be slower than you want.
The thing you mentioned - caching the latest_log_id
(or even caching just the latest_status
if that's all you need for the dashboard) is actually OK. It's called denormalization and it's a useful thing if used carefully. You've likely come across "counter cache" plugins for rails which are in the same vein - duplicating data, in the interests of being able to optimise read performance.
Upvotes: 1