pu4cu
pu4cu

Reputation: 165

Get specific rows using 2 tables SQL

I have these 2 tables:

users

| id | name  |
--------------
| 1  | user1 |
| 2  | user2 |
| 3  | user3 |
| 4  | user4 |
| 5  | user4 |

user_tasks

| id  | user_id | status    | date_created         |
---------------------------------------------------
| 1   | 1       | started   | 2020-08-04 00:00:00 |
| 2   | 1       | pending   | 2020-08-04 00:00:00 |
| 3   | 1       | pending   | 2020-08-04 00:00:00 |
| 4   | 1       | pending   | 2020-08-04 00:00:00 |
| 5   | 1       | validated | 2020-08-04 00:00:00 |
| 6   | 1       | ended     | 2020-08-04 00:00:00 |
| 7   | 2       | started   | 2020-08-04 00:00:00 |
| 8   | 2       | pending   | 2020-08-04 00:00:00 |
| 9   | 2       | pending   | 2020-08-04 00:00:00 |
| 10  | 2       | pending   | 2020-08-04 00:00:00 |
| 11  | 2       | ended     | 2020-08-04 00:00:00 |
| 12  | 3       | started   | 2020-08-04 00:00:00 |
| 13  | 3       | pending   | 2020-08-04 00:00:00 |
| 14  | 3       | pending   | 2020-08-04 00:00:00 |
| 15  | 3       | pending   | 2020-08-04 00:00:00 |
| 16  | 3       | ended     | 2020-08-04 00:00:00 |

I would like to get all users that have a task with the statuses pending and ended, and NOT the validated status.

Simply put; I would like a result with which users have NOT validated a user_task.

In the case above, it should return users.id 2, 3 since they have not validated a task.

How would I go about doing this? So far I came up with this:

SELECT *
FROM 
    (
        SELECT id
        FROM users
        JOIN user_tasks ON users.id = user_tasks.user_id
        WHERE user_tasks.status = 'pending'
        GROUP BY id
    ) as query_1
JOIN
    (
        SELECT id
        FROM users
        JOIN user_tasks ON users.id = user_tasks.user_id
        WHERE user_tasks.status != 'validated'
        GROUP BY id
    ) as query_2
ON query_1.id = query_2.id

But this query also returns the users who have validated a task.

Also, I am not sure how to properly pose this question. If anyone has any feedback for a better title, please feel free to edit.

Upvotes: 0

Views: 27

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269753

specifically the users that do not have a task with validated status.

If I understand correctly, you can use not exists:

select u.*
from users u
where not exists (select 1
                  from user_tasks ut
                  where ut.user_id = u.id and ut.status = 'validated'
                 );

EDIT:

Based on your comment, I would go for aggregation:

select ut.user_id
from user_tasks ut
group by ut.user_id
having sum( ut.status = 'validated' ) = 0 and
       sum( ut.status = 'pending' ) > 0 and
       sum( ut.status = 'ended' ) > 0 ;
   

Upvotes: 1

rotten
rotten

Reputation: 1630

This is probably how I would do it if that was all I wanted out of the data and I wanted to do it quick:

with pending_users as (
   select
       user_id
   from
       user_tasks
   where
       status = 'pending'
)
select
   distinct
   u.name
from
   user_tasks ut
   join users u on ut.user_id = u.id
where
   ut.status = 'validated'
   and
   u.id in (select user_id from pending_users)

Maybe not the most efficient or elegant, but it is the first approach to this query that pops out of my head.

More likely I'd want to do deeper analysis, and maybe build a "funnel" data viz which showed each step and where users were dropping off. That would take a little bit more time than a quick off the top of my head approach. I'd probably do an array_agg and then sort the arrays and follow up with a count of the length of the arrays, and maybe a last_value() window function to see where things stand for that user.

Upvotes: 0

Related Questions