MidnightThoughtful
MidnightThoughtful

Reputation: 233

Determine start of data's most recent uninterrupted 'streak' by date

I have a dataset that looks something like:

asset_id,date_logged
1234,2018-02-01
1234,2018-02-02
1234,2018-02-03
1234,2018-02-04
1234,2018-02-05
1234,2018-02-06
1234,2018-02-07
1234,2018-02-08
1234,2018-02-09
1234,2018-02-10
9876,2018-02-01
9876,2018-02-02
9876,2018-02-03
9876,2018-02-07
9876,2018-02-08
9876,2018-02-09
9876,2018-02-10

For the purpose of this exercise, imagine today's date is 2018-02-10 (10 Feb 2018). For all the asset_ids in the table, I am trying to identify the start of the most recent unbroken streak for date_logged.

For asset_id = 1234, this would be 2018-02-01. The asset_id was logged all 10 days in an unbroken streak. For asset_id = 9876, this would be 2018-02-07. Because the asset_id was not logged on 2018-02-04, 2018-02-05, and 2018-02-06, the most recent unbroken streak starts on 2018-02-07.

So, my result set would hopefully look something like:

asset_id,Number_of_days_in_most_recent_logging_streak
1234,10
9876,4

Or, alternatively:

asset_id,Date_Begin_Most_Recent_Streak
1234,2018-02-01
9876,2018-02-07

I haven't been able to work out anything that gets me close -- my best effort so far is to get the number of days since the first log date and today, and the number of days the asset_id appears in the dataset, and compare these to identify situations where the streak is more recent than the first day they appear. For my real dataset this isn't particularly problematic, but it's an ugly solution and I would like to understand a better way of getting to the outcome.

Upvotes: 0

Views: 70

Answers (3)

eifla001
eifla001

Reputation: 1157

you can try this,

with test (asset_id, date_logged) as
   (select 1234, date '2018-02-01' from dual union all
   select 1234, date '2018-02-02' from dual union all
   select 1234, date '2018-02-03' from dual union all
   select 1234, date '2018-02-04' from dual union all
   select 1234, date '2018-02-05' from dual union all
   select 1234, date '2018-02-06' from dual union all
   select 1234, date '2018-02-07' from dual union all
   select 1234, date '2018-02-08' from dual union all
   select 1234, date '2018-02-09' from dual union all
   select 1234, date '2018-02-10' from dual union all
   select 9876, date '2018-02-01' from dual union all
   select 9876, date '2018-02-02' from dual union all
   select 9876, date '2018-02-03' from dual union all
   select 9876, date '2018-02-07' from dual union all
   select 9876, date '2018-02-08' from dual union all
   select 9876, date '2018-02-09' from dual union all
   select 9876, date '2018-02-10' from dual union all
   select 9876, date '2018-02-11' from dual union all
   select 9876, date '2018-02-12' from dual
 ) 
SELECT asset_id, MIN(date_logged), COUNT(1)
  FROM (SELECT asset_id, date_logged, 
               MAX(date_logged) OVER (PARTITION BY asset_id)+1 max_date_logged_plus_one,
               DENSE_RANK() OVER (PARTITION BY asset_id ORDER BY date_logged desc) rown
          FROM test
         ORDER BY asset_id, date_logged desc)
 WHERE max_date_logged_plus_one - date_logged = rown 
 GROUP BY asset_id;


  ASSET_ID MIN(DATE_LOGGED)   COUNT(1)
  ---------- ---------------- ----------
        1234 01-FEB-18                10
        9876 07-FEB-18                 6

if below data is commented, output is

select 9876, date '2018-02-10' from dual union all

  ASSET_ID MIN(DATE_LOGGED)   COUNT(1)
---------- ---------------- ----------
      1234 01-FEB-18                10
      9876 11-FEB-18                 2

Upvotes: 1

user5683823
user5683823

Reputation:

Perhaps something like this. Break the query after each inline view in the WITH clause and SELECT * FROM the most recent inline view, to see what each step does.

with
  inputs ( asset_id, date_logged ) as (
    select 1234, to_date('2018-02-01', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-02', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-03', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-04', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-05', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-06', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-07', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-08', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-09', 'yyyy-mm-dd') from dual union all
    select 1234, to_date('2018-02-10', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-01', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-02', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-03', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-07', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-08', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-09', 'yyyy-mm-dd') from dual union all
    select 9876, to_date('2018-02-10', 'yyyy-mm-dd') from dual
  ),
  prep ( asset_id, date_logged, grp ) as (
    select asset_id, date_logged, 
           date_logged - row_number() 
                         over (partition by asset_id order by date_logged)
    from   inputs
  ),
  agg ( asset_id, date_logged, cnt ) as (
    select   asset_id, min(date_logged), count(*)
    from     prep
    group by asset_id, grp
  )
select   asset_id, max(date_logged) as date_start_recent_streak,
         max(cnt) keep (dense_rank last order by date_logged) as cnt
from     agg
group by asset_id
order by asset_id  -- If needed
;

  ASSET_ID DATE_START_RECENT_STREAK        CNT
---------- ------------------------ ----------
      1234 2018-02-01                       10
      9876 2018-02-07                        4

Upvotes: 2

Littlefoot
Littlefoot

Reputation: 143013

Would this make any sense?

SQL> with test (asset_id, date_logged) as
  2    (select 1234, date '2018-02-01' from dual union all
  3     select 1234, date '2018-02-02' from dual union all
  4     select 1234, date '2018-02-03' from dual union all
  5     select 1234, date '2018-02-04' from dual union all
  6     select 1234, date '2018-02-05' from dual union all
  7     select 1234, date '2018-02-06' from dual union all
  8     select 1234, date '2018-02-07' from dual union all
  9     select 1234, date '2018-02-08' from dual union all
 10     select 1234, date '2018-02-09' from dual union all
 11     select 1234, date '2018-02-10' from dual union all
 12     select 9876, date '2018-02-01' from dual union all
 13     select 9876, date '2018-02-02' from dual union all
 14     select 9876, date '2018-02-03' from dual union all
 15     select 9876, date '2018-02-07' from dual union all
 16     select 9876, date '2018-02-08' from dual union all
 17     select 9876, date '2018-02-09' from dual union all
 18     select 9876, date '2018-02-10' from dual
 19    ),
 20  inter as
 21    -- difference between DATE_LOGGED and its previous DATE_LOGGED
 22    (select asset_id,
 23       date_logged,
 24       date_logged - lag(date_logged) over (partition by asset_id order by date_logged) diff
 25     from test
 26    )
 27  select i.asset_id, min(i.date_logged) date_logged
 28  from inter i
 29  where nvl(i.diff, 1) = (select max(i1.diff) from inter i1
 30                          where i1.asset_id = i.asset_id
 31                         )
 32  group by i.asset_id
 33  order by i.asset_id;

  ASSET_ID DATE_LOGGE
---------- ----------
      1234 2018-02-01
      9876 2018-02-07

SQL>

Upvotes: 0

Related Questions