Tiger Galo
Tiger Galo

Reputation: 351

OVER clause with PARTITION BY and ORDER BY to ignore the WHERE clause of the main query

A similar question had been asked at Get top 1 row of each group, however, in that scenario there was no WHERE condition which changes the expected output. So my case goes this way.

Imagine we have the following data table which I temporary generate and populate in the SQL query below. What I want is to find the first cycle record for a given name ignoring the where clause that comes at the end. There is a Cycle column which follows the date stamp order. Notice that the cycle numbers per Name not necessarily start from 1. So for each Name I wanted to know which record represents the first cycle based on the date stamp and cycle number, since multiple records with the same timestamp may have different cycle number; in that case the earliest record is the one with lower cycle number.

CREATE TABLE temp_table 
(
    Id INT,
    Cycle INT,
    DateStamp DateTime2,
    Name VARCHAR(255)
);

INSERT INTO temp_table (id, Cycle, DateStamp, Name)
VALUES  (1, 4, '2024-01-10T17:53:12', 'John'),
        (2, 5, '2024-01-10T17:53:14', 'John'),
        (3, 3, '2024-01-10T17:53:10', 'John'),
        (4, 7, '2024-01-10T17:53:14', 'John'),
        (5, 2, '2024-01-10T17:53:10', 'John'),
        (6, 2, '2024-01-10T17:53:11', 'George'),
        (7, 1, '2024-01-10T17:53:11', 'George'),
        (8, 4, '2024-01-10T17:53:13', 'George'),
        (9, 3, '2024-01-10T17:53:12', 'George'),
       (10, 7, '2024-01-10T17:53:11', 'Tom'),
       (11, 4, '2024-01-10T17:53:10', 'Tom'),
       (12, 8, '2024-01-10T17:53:12', 'Tom'),
       (13, 3, '2024-01-10T17:53:10', 'Tom'),
       (14, 5, '2024-01-10T17:53:11', 'Tom'),
       (15, 6, '2024-01-10T17:53:11', 'Tom'),
       (16, 9, '2024-01-10T17:53:12', 'Tom');
       
SELECT * FROM temp_table;

SELECT 
    *,
    CASE 
        WHEN FIRST_VALUE(Id) OVER (PARTITION BY Name ORDER BY DateStamp, Cycle) = Id 
            THEN 1 
            ELSE 0 
    END AS IsFirstCycle
FROM
    temp_table
WHERE 
    Name = 'Tom' --AND Id >= 14

DROP TABLE temp_table;

Everything works well and the query above clearly indicates which record for each name partition is representing the initial cycle based on the date and cycle number. In the example above I filtered to see only the final resulting records for Tom.

See the screenshot of the results:

enter image description here

However, my problem is when I uncomment and add another condition to filter the final results by record IDs to be over certain value (14 in the example above), then I still get a record marked as First Cycle for that sub-partition.

But what I was expecting and what I want to see is that no record in this sub-partition is the initial first cycle, since the first cycle record for me is the one over the full partition ignoring the WHERE clause for the whole query.

enter image description here

As far as I can read and understand from the internet, the OVER clauses should be WHERE clause agnostic. Sure, I want to filter at the very end by WHERE clause and show only records with IDs over whatever number, but when for the partitioning I need to detect whether a record is a first cycle or not, I want to consider not the subset but the complete partition in the full table ignoring any WHERE condition. Please, guide me how to achieve that, since the OVER clause clearly filters the partition records by the ID > 14 condition first before applying the OVER clause to detect and mark the first cycle record within that partition. My expectation was to return the three records in the second image with all of them having IsFirstCycle marked 0, since the only first cycle record in Tom partition is the one with Id=13 as seen in the first screenshot.

Upvotes: 1

Views: 362

Answers (1)

nbk
nbk

Reputation: 49373

You can use a subselect to get the lowest id for one(or more) users

SELECT *,
      CASE WHEN (SELECT MIN(id) FROM temp_table WHERE Name = t2.Name) = Id THEN 1 ELSE 0 END AS IsFirstCycle
FROM temp_table t2
WHERE Name = 'Tom' 
  AND Id >= 14

Id Cycle DateStamp Name IsFirstCycle
14 5 2024-01-10 17:53:11 Tom 0
15 6 2024-01-10 17:53:11 Tom 0
16 9 2024-01-10 17:53:12 Tom 0

You can change the inner SELECT to somthing that can better be sorted

like

SELECT *,
      CASE WHEN (
      SELECT id   
      FROM temp_table WHERE Name = t2.Name
      ORDER BY DateStamp LIMIT 1)      
      = Id THEN 1 ELSE 0 END AS IsFirstCycle
FROM temp_table t2
WHERE Name = 'Tom' 
  AND Id >= 14

Also can you use a CTE with rownumber as further possibility.

As your actual is more complex, but we don't know nothing about, you must add the other table and check for one user manually if the correct id is selected, else you must refine your apporach

Upvotes: 1

Related Questions