don_Bigote
don_Bigote

Reputation: 947

How can I speed up this query in a Rails app?

I need help optimizing a series of queries in a Rails 5 app. The following explains what I am doing, but if it isn't clear let me know and I will try to go into better detail.

I have the following methods in my models:

In my IncomeReport model:

class IncomeReport < ApplicationRecord

  def self.net_incomes_2015_totals_collection
    all.map(&:net_incomes_2015).compact
  end

  def net_incomes_2015
    (incomes) - producer.expenses_2015
  end

  def incomes
    total_yield * 1.15
  end

end

In my Producer model I have the following:

class Producer < ApplicationRecord

  def expenses_2015
    expenses.sum(&:expense_per_ha)
  end

end

In the Expense model I have:

class Expense < ApplicationRecord

  def expense_per_ha
    total_cost / area
  end

end

In the controller I have this (I am using a gem called descriptive_statistics to get min, max, quartiles, etc in case you are wondering about that part at the end)

@income_reports_2015 = IncomeReport.net_incomes_2015_totals_collection.extend(DescriptiveStatistics)

Then in my view I use

<%= @income_reports_2015.descriptive_statistics[:min] %> 

This code works when there are only a few objects in the database. However, now that there are thousands the query takes forever to give a result. It takes so long that it times out!

How can I optimize this to get the most performant outcome?

Upvotes: 1

Views: 562

Answers (3)

Siggy
Siggy

Reputation: 81

It looks like you have a few n+1 queries here. Each report grabs its producer in an an individual query. Then, each producer grabs each of its expenses in a different query.

You could avoid the first issue by throwing a preload(:producer) instead of the all. However, the sums later will be harder to avoid since sum will automatically fire a query.

You can avoid that issue with something like

def self.net_incomes_2015_totals_collection
  joins(producer: :expenses).
    select(:id, 'income_reports.total_yield * 1.15  - SUM(expenses.total_cost/expenses.area) AS net_incomes_2015').
    group(:id).
    map(&:net_incomes_2015).
    compact
end

to get everything in one query.

Upvotes: 1

razvans
razvans

Reputation: 3251

What's the structure of IncomeReport? By looking at the code your problem lies in all from net_incomes_2015_totals_collection. all hits the database and returns all records then you map them. Overkill. Try to filter the data, query less, select less and get all the info you want directly with ActiveRecord. Ruby loops slows things down.

So, without know the table structure and its data, I'd do the following:

def self.net_incomes_2015_totals_collection
 where(created_at: 2015_start_of_year..2015_end_of_year).where.not(net_incomes_2015: nil).pluck(:net_incomes_2015)
end

Plus I'd make sure there's a composide index for created_at and net_incomes_2015.

It will probably be slow but better than it is now. You should think about aggregating the data in the background (resque, sidekiq, etc) at midnight (and cache it?).

Hope it helps.

Upvotes: 1

PrimeTimeTran
PrimeTimeTran

Reputation: 2137

One approach might be to architecture your application differently. I think a service-oriented architecture might be of use in this circumstance.

Instead of querying when the user goes to this view, you might want to use a worker to query intermittently, then write to a CSV. Thus, a user navigates to this view and you could read from the CSV instead. This would run much faster because instead of doing a query then & there(when the user navigates to this page) you're simply reading from a file that was created before as a background process.

Obviously, this has its own set of challenges, but I've done this in the past to solve a similar problem. I wrote an app that fetched data from 10 different external API's once a minute. The 10 different fetches resulted in 10 objects in the db. 10 * 60 * 24 = 14,400 records in the DB per day. When a user would load the page requiring this data, they would load 7 days worth of records, 100,800 database rows. I ran into the same problem where the query being done at runtime resulted in a timeout, I wrote to a CSV and read it as a workaround.

Upvotes: 1

Related Questions