Syl
Syl

Reputation: 3829

Select by last relation or no relation

I have a relation between task and operations. I want to select the manual tasks without operation OR with a action != 'cancelled' for the last operation.

Example

tasks table :
-------------
|id | state |
| 3 |auto   |
| 5 |manual |
| 6 |manual |
| 2 |manual |
| 8 |manual |

operations table :
----------------------------------
|id | action    | task_id | time |
| 1 | processed | 8       | 8:00 |
| 2 | cancelled | 8       | 9:00 |
| 3 | processed | 6       | 8:00 |
| 4 | processed | 2       | 8:00 |
| 5 | cancelled | 2       | 7:00 |

result : tasks 5,6,2

Here is what I managed to do :

SELECT tasks.*
WHERE tasks.id NOT IN(
    SELECT operations.task_id)
OR tasks.id IN
    SELECT operations.task_id
    WHERE action!=cancelled
    .....
    .....
    ORDER BY time DESC

I am not sure how to end this query. Also it will perform 2 query on operations for each line in tasks.

Upvotes: 0

Views: 35

Answers (3)

John Bollinger
John Bollinger

Reputation: 180058

If you're willing to rely on 'cancelled' being the last operation performed on a task if ever it is performed -- that is, if cancellation puts a task in a terminal state, then don't need to focus on cancellation being the last operation, only on whether there is a cancellation at all.

In that case, you can use a fairly simple outer join to find your result:

select t.*
from
  tasks t
  left outer join operations o
    on t.id = o.task_id
       and o.action = 'cancelled'
where t.state = 'manual' and o.id is null;

Note in particular the o.id is null in the filter condition -- supposing that 'id' is the primary key for table operations, it can be null in the joined result only where no row of operations satisfies the join condition for a given row of tasks. That is, only for tasks that have no associated 'cancelled' operation (including those with no operations at all).

On the other hand, if you need to include tasks that have a 'cancelled' operation, but for which the last operation is something different, then you might use a window function to help you identify the last operation for each task:

select t.*
from
  tasks t
  left outer join (
      select
        operations.*,
        max(operations.time) over (partition by task_id) as last_time 
      from operations
    ) o
    on t.id = o.task_id
      and o.action = 'cancelled'
      and o.time = o.last_time
where t.state = 'manual' and o.id is null;

It is essential here that the o.time = o.last_time appear in the join condition, not a filter condition of the inline view (where it would be illegal) or the filter condition of the outer query (where it would have the wrong effect).

Upvotes: 1

Gab
Gab

Reputation: 3520

Let's say this is your schema:

CREATE TABLE tasks (
    id int unique,
    state text
);

CREATE TABLE operations (
    id serial,
    action text,
    task_id int REFERENCES tasks (id),
    time text
);

INSERT INTO tasks (id, state) VALUES (3, 'auto');
INSERT INTO tasks (id, state) VALUES (5, 'manual');
INSERT INTO tasks (id, state) VALUES (6, 'manual');
INSERT INTO tasks (id, state) VALUES (2, 'manual');
INSERT INTO tasks (id, state) VALUES (8, 'manual');

INSERT INTO operations (action, task_id, time) VALUES ('processed', 8, '8:00');
INSERT INTO operations (action, task_id, time) VALUES ('cancelled', 8, '9:00');
INSERT INTO operations (action, task_id, time) VALUES ('processed', 6, '8:00');
INSERT INTO operations (action, task_id, time) VALUES ('processed', 2, '8:00');
INSERT INTO operations (action, task_id, time) VALUES ('cancelled', 2, '7:00');

This would be your query:

SELECT *
FROM tasks
WHERE state = 'manual' AND
      (
        NOT EXISTS (SELECT 1 FROM operations WHERE task_id = tasks.id LIMIT 1)
        OR
        (SELECT action FROM operations WHERE task_id = tasks.id ORDER BY id DESC LIMIT 1) != 'cancelled'
      )
;

Output:

 id | state  
----+--------
  5 | manual
  6 | manual
(2 rows)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269463

Hmmm. I'm thinking distinct on to get the information about the operations:

select distinct on (task_id)
from operations o
order by task_id, time desc;

Then a left join to get the tasks:

select t.*
from tasks t left join
     (select distinct on (task_id)
      from operations o
      order by task_id, time desc
     ) o
     on o.task_id = t.task_id
where o.task_id is null or o.action <> 'cancelled';

Upvotes: 1

Related Questions