Scope: Sort by difference between two dates

Question

I would like to create a scope which sort data from the closest date to the farthest.

For example I have 3 values:

Then I would like to sort the date closest to a given date: 2012-12-29.
I should have as result this order: 2, 3, 1.
If I choose 2012-12-30 the result must be: 3, 2, 1.

I tried something like this:

scope :order_by_closest_date, lambda{|time| 
  select("*, (date - DATE('#{time}')) AS time").order("time ASC")
}

But it doesn't work.
For information: Rails 3.2.9 Ruby 1.9.3 Postgresql 9.1.4.
Any ideas?

Erwin Brandstetter · Accepted Answer

Simple query

^{First example works with a date column according to initial question.}

Not sure about Ruby syntax, but a proper SQL statement would be:

SELECT * 
FROM   tbl
ORDER  BY @(date_col - '2012-12-29'::date)

@ being the "absolute value" operator.

Never use date or time as identifiers. While being allowed (with some restrictions) in PostgreSQL, these are reserved words in the SQL standard and it leads to confusing error messages and possibly unexpected errors.

Superior performance

^{The rest works with a timestamp column according to the update in the comment.}

For small tables or ad-hoc queries, the above solution is just fine. For medium or big tables, if performance matters, I suggest a more sophisticated approach.

Condicio sine qua non is an index on the date or timestamp column. Like this:

CREATE INDEX tbl_my_timestamp_idx ON tbl(my_timestamp);

With the index in place the following query which will nuke the performance of the simple one for bigger tables:

SELECT *
FROM  (
    (
    SELECT *
    FROM   tbl
    WHERE  my_timestamp >= '2012-12-30 11:32'::timestamp
    ORDER  BY my_timestamp
    LIMIT  3
    )

    UNION ALL
    (
    SELECT *
    FROM   tbl
    WHERE  my_timestamp < '2012-12-30 11:32'::timestamp
    ORDER  BY my_timestamp DESC
    LIMIT  3
    )
    ) x
ORDER  BY @extract('epoch' FROM (my_timestamp - '2012-12-28 11:32'::timestamp))
LIMIT  3;

Parenthesis around the two legs of the UNION ALL - SELECT are not optional. Needed to apply LIMIT to each leg.
If you order by additional columns, reflect that in your index - use a multi-column index in that case.

How so?

The first query uses an expression as condition. Postgres has to compute a value for every single row, then order by the result and pick the first few. No problem for a small table, but very expensive for big tables. O(n); n being the number of rows in the table. It can't use a plain index. Plus some non-trivial cost to sort and pick the winners among all rows.
You could create an index on an expression, which would be fastest, but that only works for a constant timestamp to compare to - hardly a realistic use case.

The second query finds the position according to your timestamp in the index, reads tuple pointers for next couple of rows sequentially and fetches them straight from the table (or even directly from the index with an index-only scan in 9.2). Twice, once up, once down, since we don't know how the peers compare. But that's just 2 x O(log(n)) (typical b-tree look-up cost) Computation is only done for the few pre-selected rows. Picking the winners from the small sample carries a trivial constant cost.

Just test with EXPLAIN ANALYZE. In a quick test on a real-life table I got a factor 1000 with a table of 50k rows. And it keeps scaling up for bigger tables.

Scope: Sort by difference between two dates

Answers (2)

Simple query

Superior performance

How so?

Related Questions