Iwishworldpeace
Iwishworldpeace

Reputation: 488

How to select rows with conditional values of one column in SQL

Say I have this table:

id timeline
1 BASELINE
1 MIDTIME
1 ENDTIME
2 BASELINE
2 MIDTIME
3 BASELINE
4 BASELINE
5 BASELINE
5 MIDTIME
5 ENDTIME
6 MIDTIME
6 ENDTIME
7 RISK
7 RISK

So this is what the data looks like except the data has more observations (few thousands)

How do I get the output so that it will look like this:

id timeline
1 BASELINE
1 MIDTIME
2 BASELINE
2 MIDTIME
5 BASELINE
5 MIDTIME

How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.

I used

SELECT * 
FROM df 
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)

and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.

id  timeline   
---|--------|
 1 | BASELINE |
 1 | MIDTIME  |
 2 | BASELINE |
 2 | MIDTIME  | 
 5 | BASELINE | 
 5 | MIDTIME  |
 6 | MIDTIME  |    ---- dont want this
 6 | ENDTIME  |    ---- dont want this
 7 | RISK     |    ---- dont want this
 7 | RISK     |    ---- dont want this

Many Thanks.

Upvotes: 3

Views: 2776

Answers (3)

e_i_pi
e_i_pi

Reputation: 4820

There's probably a number of ways to do this, here's one way that will pick up BASELINE and MIDTIME rows where only they exist, ensuring there are only 2 rows per returned ID. Without knowing the ordering of timeline, it's not possible to go further I don't think:

SELECT
      id
    , timeline
FROM (
    SELECT
          *
        , SUM(CASE WHEN timeline = 'BASELINE' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS BaselineCount
        , SUM(CASE WHEN timeline = 'MIDTIME' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS MidtimeCount
    FROM df
    WHERE df.timeline IN ('BASELINE', 'MIDTIME')
) subquery
WHERE subquery.BaselineCount > 0
AND subquery.MidtimeCount > 0
GROUP BY
      id
    , timeline
;

Upvotes: 0

Ivan Verges
Ivan Verges

Reputation: 641

I think this query should give you the result you want.

NOTE: As i understand, you don't want the ID where exists a "ENDTIME", and in your sample data, there is an "ENDTIME" for ID 1. I assumed this was an error so i made a query that excludes all id containing "ENDTIME".

WITH CTE AS
(
    SELECT
        id
    FROM
        df
    WHERE
        timeline IN ('ENDTIME', 'RISK')
)
SELECT
    id,
    timeline
FROM
    df
WHERE
    id NOT IN (SELECT id FROM CTE);

Upvotes: 0

Fahmi
Fahmi

Reputation: 37473

You can try using exists -

DEMO

    select * from t t1 where timeline in ('BASELINE','MIDTIME') and
    exists
     (select 1 from t t2 where t1.id=t2.id and timeline in ('BASELINE','MIDTIME')
            group by t2.id having count(distinct timeline)=2)

OUTPUT:

id  timeline
1   BASELINE
1   MIDTIME
2   BASELINE
2   MIDTIME
5   BASELINE
5   MIDTIME

Upvotes: 2

Related Questions