Dan
Dan

Reputation: 83

Determine Streak and Time in Streak

I have a simple table with buildID (UUID, not int), Repository, the date, and the result:

BuildID     Repository  BuildDateTime         Result
------------------------------------------------------
1234        Repo1       2017-01-02T01:23:56   Fail
2224        Repo1       2017-01-02T01:22:56   Fail
3234        Repo1       2017-01-02T01:20:56   Success
4234        Repo2       2017-01-01T01:20:00   Fail
5234        Repo2       2017-01-01T01:19:00   Success
6234        Repo2       2017-01-01T01:18:00   Success
7234        Repo3       2017-01-01T01:17:30   Success
8234        Repo2       2017-01-01T01:17:00   Success

I am wanting to determine, for each build, how many consecutive builds in a row that the repository's build has had the same result, and when it started the streak. I want to put this into a view so the results look like this:

BuildID  Repository  BuildDateTime         Result   Streak   StreakStartDateTime
-------------------------------------------------------------------------------
1234     Repo1       2017-01-02T01:23:56   Fail     2        2017-01-02T01:22:56
2224     Repo1       2017-01-02T01:22:56   Fail     1        2017-01-02T01:22:56
3234     Repo1       2017-01-02T01:20:56   Success  1        2017-01-02T01:20:56
4234     Repo2       2017-01-01T01:20:00   Fail     1        2017-01-01T01:20:00
5234     Repo2       2017-01-01T01:19:00   Success  3        2017-01-01T01:17:00
6234     Repo2       2017-01-01T01:18:00   Success  2        2017-01-01T01:17:00 
7234     Repo3       2017-01-01T01:17:30   Success  1        2017-01-01T01:17:30
8234     Repo2       2017-01-01T01:17:00   Success  1        2017-01-01T01:17:00

I've been experiementing with the lag function, which helps me understand the previous X rows. The problem is that the number of rows I need to go back to find the results varies from row to row. I am beginning to think it is a dead end.

Any help is appreciated.

Upvotes: 0

Views: 138

Answers (2)

Alex Fletcher
Alex Fletcher

Reputation: 56

Here's another way of doing it. Essentially assigning each build an order within the history and finding the oldest one in the history which shares the same status and doesn't have anything in between with a different status.

WITH CTE_Builds AS (
    SELECT *
    , RN = ROW_NUMBER() OVER (PARTITION BY Repository ORDER BY BuildDateTime)
    FROM BuildHistory
)
SELECT a.*
, Streak = ISNULL(a.RN - x.MIN_RN,0) + 1
, StreakStartDateTime = ISNULL(x.MIN_DATE, a.BuildDateTime)
FROM CTE_Builds a
OUTER APPLY (
    SELECT MIN_RN = MIN(RN),  MIN_DATE = MIN(BuildDateTime)
    FROM CTE_Builds b
    WHERE b.Repository = a.Repository
      AND b.RN < a.RN
      AND b.Result = a.Result
      AND NOT EXISTS (
        SELECT *
        FROM CTE_Builds c
        WHERE c.Repository = b.Repository
          AND c.RN BETWEEN b.RN and a.RN
          AND c.Result != b.Result
        )
    ) x
ORDER BY a.Repository, a.BuildDateTime DESC

Upvotes: 1

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

This can be done with lag.

select t.*
,row_number() over(partition by repository,grp order by buildDatetime) as streak
,min(buildDatetime) over(partition by repository,grp) as streakStartDatetime
from (select t.*
     ,sum(prev_same_or_not) over(partition by repository order by buildDatetime) as grp
      from (select t.*  
            ,case when lag(result) over(partition by repository order by buildDatetime) = result then 0 else 1 end as prev_same_or_not
            from tbl t
           ) t
     ) t

Looks overwhelming, but here is how it works.

  1. Check if the previous row has the same result as current and assign a flag accordingly.
  2. Use a running sum to classify consecutive results with same value into the same group.
  3. Get the min datetime per group and use row number to calculate streaks based on groups.

Upvotes: 3

Related Questions