Raccoon
Raccoon

Reputation: 438

How to optimize/cache Rails query

I've Rails application where query takes way too long time. It uses postgresql DB and query consists of one table with thousands of records.

statistics_controller.rb

all_data = Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]) 
    .where(Usagedata.arel_table[(:wall_duration)].not_eq("0"))
    .in_range( @from_date, @to_date)
if @user
  all_data = all_data.by_user(@user)
end

all_data = all_data.to_a #Forcing to make query 
@data = all_data = all_data.to_a 

What I'd like to do is to keep main query result (without in_range and user statement) in Rails serverside application cache and update data once per hour.

Part of the code which should be cached:

Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]) 
    .where(Usagedata.arel_table[(:wall_duration)].not_eq("0"))

Usage of cached records

Addition to that client can select date ranges from calendar @from_date and @to_date. Time period between dates can be from 1 day...~3 years. (This is why cache should store all records from DB table.) Data is used to draw charts and show/calculate top users statistics.

I've tried @MrTheWalrus solution

@statistics = Rails.cache.fetch('usagedata', :expires_in => 24.hours) do
  Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id])
    .where(Usagedata.arel_table[(:wall_duration)].not_eq("0")).all
end

But this way I can't get my subquery to work:

all_data = @statistics.in_range( @from_date, @to_date)
if @user
  all_data = all_data.by_user(@user)
end

This gives me an error:

undefined method `in_range' for #<Array:0x007fa5ecc77588>

Although I have in_range defined in Usagedata model like this:

def self.in_range(from_date, to_date)
  where("start_time <= :to AND end_time >= :from", :from => from_date, :to => to_date)
end

What I am doing wrong?

EDIT: Thanks to @Craig Ringer solution I've managed to solve the indexes issue described here:

The whole application seems to be really slow. What am I doing wrong? Probably I need to add indexes as well but how?

  Usagedata Load (243.4ms)  SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')EXPLAIN (1.9ms)  EXPLAIN SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')

EXPLAIN for: SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata"  WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')
QUERY PLAN
 ---------------------------------------------------------------------------------------
 Seq Scan on usagedata  (cost=0.00..4558.02 rows=7989 width=34)
 Filter: ((wall_duration <> 0) AND (start_time <= '2013-09-02 20:59:59.999999'::timestamp without time zone) AND (end_time >= '2013-05-05 21:00:00'::timestamp without time zone))
 (2 rows)

Upvotes: 1

Views: 1355

Answers (2)

MrTheWalrus
MrTheWalrus

Reputation: 9700

Craig Ringer's comment already discussed indexing, so I'm just going to talk about caching.

The problem with the caching code you include is that the thing you're caching is an ActiveRecord::Relation - basically just a SQL query waiting to run, not the results of that query. Caching the relation means that each time it's loaded from the cache, it still has to execute the query, which is the part that takes a long time. Tack an .all on to the end to force the query to actually run - that will make sure that the results are cached, not the query:

@statistics = Rails.cache.fetch('usagedata', :expires_in => 24.hours) do
  Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]).
    where(Usagedata.arel_table[(:wall_duration)].not_eq("0")).all
end

Edit: The reason you can't call .in_range on this is that .in_range modifies the query (by adding a WHERE clause). Once you've run the query and cached the results, you can't modify it in that way - the whole point of caching query results is that you run the query once and use the results multiple times - if the query changes, that's not an option.

Assuming that adding indexing hasn't solved your problem already, my suggestion is that you filter the results in Ruby, rather than in the database. Assuming you've already populated the cache (via Whenever or in some other manner):

from_time = 1.week.ago
to_time = 1.day.ago
@statistics = Rails.cache.fetch('usagedata')
@filtered_statistics = @statistics.select do |item|
  item.start_time < to_time && item.end_time > from_time
end

Upvotes: 2

Craig Ringer
Craig Ringer

Reputation: 324551

An partial index on (start_time, end_time) with a WHERE ("usagedata"."wall_duration" != 0) index filter clause will make this query a lot quicker. Or even a non-partial index on (start_time, end_time).

This may render client-side caching unnecessary. If not, look into whether Rails has any support for creating and managing server-side materialized views.

Upvotes: 1

Related Questions