Daniel Bonnell
Daniel Bonnell

Reputation: 4997

ActiveRecord order a relation with nulls last and non-null values sorted by secondary attribute

I've got a model, Prospect. I want to sort prospects by some value (e.g. updated_at ASC), however I also want to bisect the sorted list so that all prospects where lead_id IS NOT NULL are shown first, sorted by updated_at ASC, then all prospects where lead_id IS NULL are shown last sorted by updated_at ASC. I don't care what lead_id is, only that any records where lead_id IS NOT NULL are bumped to the beginning of the list. I need to be able to paginate the dataset and the query needs to be relatively efficient (turnaround time should be well under 500ms, ideally under 100ms).

First Approach

I first tried to accomplish this using the following query, however this doesn't work (for my use case) because prospects are sorted by lead_id (as you would expect from this query), which is unique, therefore making a secondary sort effectively useless.

Prospect.order("lead_id ASC nulls last, updated_at ASC")

Second Approach

I tried a (slightly modified) approach suggested by Andrey Deineko. This returns the entire dataset in the correct order, but there is no way to merge the two separate relations into a single relation that can be paginated. As a result, in order to paginate the dataset, I need to instantiate every row from the table into memory. That would be acceptable for a few dozen records at most, but certainly not 20k+.

# prospect.rb
scope :with_leads,    -> { where.not(lead_id: nil) }
scope :without_leads, -> { where(lead_id: nil) }
scope :ordered,       -> { order(:total_clicks_count) } 

[Prospect.with_leads.ordered, Prospect.without_leads.ordered].flatten

Third Approach

I realized that I can get a sorted list of prospect ids (both with and without a lead_id) and use that to get the full dataset ordered by id. This accomplishes what I need and works fine for a few dozen or hundred records, but isn't viable for 20k+ records.

lead_ids = Prospect.where.not(lead_id: nil).pluck(:id)
prospect_ids = Prospect.where(lead_id: nil).pluck(:id)
prospects = Prospect.order_by_ids([lead_ids, prospect_ids].flatten)

Here is the source for order_by_ids:

class ApplicationRecord < ActiveRecord::Base
    self.abstract_class = true

    def self.order_by_ids(ids)
        # https://stackoverflow.com/a/26777669/4139179
        order_by = ["CASE"]
        ids.each_with_index do |id, index|
            order_by << "WHEN #{self.name.underscore}s.id='#{id}' THEN #{index}"
        end

        order_by << "END"
        order(order_by.join(" "))
    end
end

The problem with this second approach is that it takes up to 1 second per 1000 records to build the list or ordered ids for each subset (prospects with and without a lead_id) and then use that to fetch the entire dataset in order.

Is there a better approach that will return the entire dataset (in a way that can be paginated) ordered by some attribute (e.g. updated_at ASC) where prospects with a lead_id are at the top of the list and those without are at the bottom?

Upvotes: 1

Views: 2295

Answers (1)

Andrey Deineko
Andrey Deineko

Reputation: 52357

When using ORDER BY "column" in PostgreSQL, NULL values will come last by default. So

Prospect.order(:lead_id, :updated_at)

should do the trick.

Your actual need:

The effect in practice would be that when a user views a list of their sales prospects, those that have been converted to leads will show up first in the list, even if the list is sorted by other attributes.

# prospect.rb
scope :with_leads,    -> { where.not(lead_id: nil) }
scope :without_leads, -> { where(lead_id: nil) }
scope :ordered,       -> { order(:total_clicks_count) } 

And then use these scopes to present to the user:

Prospect.with_leads.ordered.each do
  #...
end

Prospect.without_leads.ordered.each do
  #...
end

Upvotes: 1

Related Questions