Reputation: 44735

Are three levels of select a performance issue?

I'm fairly new to Postgres, and I'm trying to implements some complex functionality.

The following query works:

select (
    select ts as start
    from (
        select *
        from events
        where (reason = 'START' or reason = 'STOP')
        and ts < ?
        and user = ?
        order by ts desc
        limit 1
    ) as subquery_start
    where reason = 'START'
    ),(
    select ts as stop
    from (
        select *
        from events
        where (reason = 'START' or reason = 'STOP')
        and ts > ?
        and user = ?
        order by ts
        limit 1
    ) as subquery_stop
    where reason = 'STOP'
);

It finds whether the user, at a time, is between START and STOP events, returning:

           start           |           stop            
---------------------------+---------------------------
 2018-06-01 10:44:55.52+01 | 2018-06-01 10:45:07.52+01
(1 row)

whether they are not:

 start | stop 
-------+------
       | 
(1 row)

or whether they're just after a START and a later STOP is not yet present:

           start           | stop           
---------------------------+------
 2018-06-01 10:44:55.52+01 | 
(1 row)

Is it possible to simplify such a query, given that I want one row returned, as in the examples above?

Will three levels of nested selects cause performance issues?

Upvotes: 0

Answers (2)

Haleemur Ali

Reputation: 28253

use window functions to avoid nesting altogether

SELECT 

    CASE WHEN FIRST_VALUE(reason) OVER w1 = 'START'
      THEN FIRST_VALUE(ts) OVER w1 
      ELSE NULL 
    END start, 
    CASE WHEN LAST_VALUE(reason) OVER w2 = 'STOP'
      THEN LAST_VALUE(ts) OVER w2 
      ELSE NULL 
    END stop
FROM events
WHERE (reason = 'START' or reason = 'STOP')  
  AND ts > ? AND ts < ?
  AND user = ?
WINDOW w1 AS (ORDER BY reason != 'START', ts)
       w2 AS (ORDER BY reason = 'STOP', ts)
LIMIT 1

This works because Boolean values are sorted False, True.

So, w1 puts all the rows where reason = 'START' at the top, and within that group sorts by ts. We pick out the ts in the first row with FIRST_VALUE. Similarly, we get the last ts where reason = 'STOP'

This detects the following cases:

No START or STOP reasons were found for user in the time range (no rows returned)
Either START or STOP (but not both) reasons were found for user in the time range (row is returned with 1 null)
The first START ts is after the last STOP ts (e.g. if the very last row ordered by ts is a only one where reason = 'START' (comparison of the returned row's start & stop values
The START ts < STOP ts as expected. (row is returned with both columns properly populated)

Upvotes: 0

Jasen

Reputation: 12402

It looks pretty good to me. the inner selects being limit 1 will use any index you have on the 'user,ts' column-pair,

The best index for this query would be on events(user,ts) where (reason = 'START' or reason = 'STOP') but on events(user,ts) should be almost as good.

possibly a clearer way to get a similar result would be

WITH subquery_before AS (
        select ts,reason
        from events
        where (reason = 'START' or reason = 'STOP')
        and ts < ?
        and user = ?
        order by ts desc
        limit 1
    ),
subquery_after AS (
        select ts,reason
        from events
        where (reason = 'START' or reason = 'STOP')
        and ts > ?
        and user = ?
        order by ts
        limit 1
    ) 
SELECT
  subquery_before.ts AS start,
  subquery_after.ts AS stop
WHERE  subquery_before.reason = 'START'
  AND  subquery_after.reason = 'STOP'

Upvotes: 1

Are three levels of select a performance issue?

Answers (2)

Related Questions