Laurens
Laurens

Reputation: 2420

How to optimize ruby on rails statistics per month query

I developed an application that gives logged in users an overview of statistics per month.

This is my current approach:

Statistics.html.haml:

#(@parsed months is an array of monthnames.)

- @parsed_months.each do |month|
  = render :partial => "statistic", :locals => {:month => month}

_statistic.html.haml:

%tr{:class => cycle("odd", "even")}
  %td= l(month, :format => "%B").capitalize
  %td= current_user.total_views_count(month)
  %td= current_user.total_leads_count(month)
  %td= current_user.total_clicks_count(month)

The method that returns the total views (in User.rb):

def total_views_count(month = nil)
  if month == nil
    v = 0
    self.companies.each {|c| v += c.counts.size}
    return v
  else
    v = 0
    self.companies.each {|c| v += c.counts.where(:created_at => Date.today.beginning_of_year..Date.today.end_of_year).where(:created_at => month.beginning_of_month..month.end_of_month).size}
    return v
  end
end

Company.rb:

belongs_to :user
has_many :counts, :as => :countable, :dependent => :destroy

Count.rb:

belongs_to :countable, :polymorphic => true

User.rb:

has_many :companies

This was performing well, but after a few months the Count model has grown to a million + records, causing request timeouts on heroku.

What can I do to optimize this queries or are there better approaches to do this ?

Thanks in advance!

Upvotes: 1

Views: 970

Answers (2)

rubish
rubish

Reputation: 10907

You should pay attention to following to optimize the queries:

  1. Try to reduce queries in a single request
  2. Optimize you indexes
  3. Create rollup tables

Points 2 and 3 are same as mentioned by @opensourcechris.

I have not worked with active record for quite sometime, so I can not give you the arel syntax of the query, but main problem is because there is lot of data and you are doing many heavy queries in a single request. You should use joins to reduce queries and use indexes carefully to make joins and the query optimal. Query with joins would look something like following:

SELECT count(c.id) FROM users u
  JOIN companies comp ON comp.user_id = u.id
  JOIN counts c ON c.company_id = comp.id
                   AND c.countable_type = 'Company'
                   AND c.created_at BETWEEN date_range
  WHERE u.id = currrent_user_id

You can also use GROUP BY here to retrieve data for all the months in a single query yest retain the counts on a monthly basis.

For joins to work effectively, you should have a index on companies.user_id and a compound index on counts.countable_id, counts.countable_type, counts.created_at.

That should do it for now, but as counts has grown to million plus number in just few months that is not going the solve the problem in a long run. As the counts table grows even this query will start slowing down. In a relational database query times increase almost linearly with the growing number of rows, but after a certain threshold it starts growing at a more rapid rate. So its always wise to contain the size of tables which you need to work with often. That is when rolling tables comes into picture.

With this amount of data coming in insert speed is also a concern. So you should probably create a table without any indexes and log all the count data in that table. Data can be rolled up into other tables at regular intervals of time. The roll up tables can be created on a granularity as desired for reporting. Common options are hourly, daily, weekly, monthly and yearly rollup tables.

Data can also be dumped in an archive table to keep historical records so that roll up tables can be recreated any time with a different granularity or other requirements. Having data been dumped into a archive table, it can be purged from the main table so that insert speed is not compromised over time. It also allows log any views what so ever without worrying about constraints like 10 minute rule, because data can be cleansed before rolling.

PS: I guess you should use a session_id(uuid) along with ip address to correctly count the views. Genrally a public ip address is shared among many internet users.

Upvotes: 1

OpenCoderX
OpenCoderX

Reputation: 6318

To optimize the queries you should start by auditing the indexes on each table. Since your WHERE is on a date field I think indexes will work well use this to view your indexes:

USE *database*;
SHOW INDEX FROM *tablename*;

Then make sure you are indexing your where columns.

Another option is to forget calculating totals for historic months, just calculate them at the end of the month and store them in a new summary table. So all you will be calculating on the fly is the data for the current month, previous months can be returned from your new summary table.

Upvotes: 0

Related Questions