Reputation: 438
I've Rails application where query takes way too long time. It uses postgresql DB and query consists of one table with thousands of records.
statistics_controller.rb
all_data = Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id])
.where(Usagedata.arel_table[(:wall_duration)].not_eq("0"))
.in_range( @from_date, @to_date)
if @user
all_data = all_data.by_user(@user)
end
all_data = all_data.to_a #Forcing to make query
@data = all_data = all_data.to_a
What I'd like to do is to keep main query result (without in_range and user statement) in Rails serverside application cache and update data once per hour.
Part of the code which should be cached:
Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id])
.where(Usagedata.arel_table[(:wall_duration)].not_eq("0"))
Usage of cached records
Addition to that client can select date ranges from calendar @from_date and @to_date. Time period between dates can be from 1 day...~3 years. (This is why cache should store all records from DB table.) Data is used to draw charts and show/calculate top users statistics.
I've tried @MrTheWalrus solution
@statistics = Rails.cache.fetch('usagedata', :expires_in => 24.hours) do
Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id])
.where(Usagedata.arel_table[(:wall_duration)].not_eq("0")).all
end
But this way I can't get my subquery to work:
all_data = @statistics.in_range( @from_date, @to_date)
if @user
all_data = all_data.by_user(@user)
end
This gives me an error:
undefined method `in_range' for #<Array:0x007fa5ecc77588>
Although I have in_range defined in Usagedata model like this:
def self.in_range(from_date, to_date)
where("start_time <= :to AND end_time >= :from", :from => from_date, :to => to_date)
end
What I am doing wrong?
EDIT: Thanks to @Craig Ringer solution I've managed to solve the indexes issue described here:
The whole application seems to be really slow. What am I doing wrong? Probably I need to add indexes as well but how?
Usagedata Load (243.4ms) SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')EXPLAIN (1.9ms) EXPLAIN SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')
EXPLAIN for: SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')
QUERY PLAN
---------------------------------------------------------------------------------------
Seq Scan on usagedata (cost=0.00..4558.02 rows=7989 width=34)
Filter: ((wall_duration <> 0) AND (start_time <= '2013-09-02 20:59:59.999999'::timestamp without time zone) AND (end_time >= '2013-05-05 21:00:00'::timestamp without time zone))
(2 rows)
Upvotes: 1
Views: 1355
Reputation: 9700
Craig Ringer's comment already discussed indexing, so I'm just going to talk about caching.
The problem with the caching code you include is that the thing you're caching is an ActiveRecord::Relation
- basically just a SQL query waiting to run, not the results of that query. Caching the relation means that each time it's loaded from the cache, it still has to execute the query, which is the part that takes a long time. Tack an .all
on to the end to force the query to actually run - that will make sure that the results are cached, not the query:
@statistics = Rails.cache.fetch('usagedata', :expires_in => 24.hours) do
Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]).
where(Usagedata.arel_table[(:wall_duration)].not_eq("0")).all
end
Edit:
The reason you can't call .in_range
on this is that .in_range
modifies the query (by adding a WHERE
clause). Once you've run the query and cached the results, you can't modify it in that way - the whole point of caching query results is that you run the query once and use the results multiple times - if the query changes, that's not an option.
Assuming that adding indexing hasn't solved your problem already, my suggestion is that you filter the results in Ruby, rather than in the database. Assuming you've already populated the cache (via Whenever or in some other manner):
from_time = 1.week.ago
to_time = 1.day.ago
@statistics = Rails.cache.fetch('usagedata')
@filtered_statistics = @statistics.select do |item|
item.start_time < to_time && item.end_time > from_time
end
Upvotes: 2
Reputation: 324551
An partial index on (start_time
, end_time
) with a WHERE ("usagedata"."wall_duration" != 0)
index filter clause will make this query a lot quicker. Or even a non-partial index on (start_time, end_time)
.
This may render client-side caching unnecessary. If not, look into whether Rails has any support for creating and managing server-side materialized views.
Upvotes: 1