user3150002
user3150002

Reputation: 141

An Alternative to a Cursor For This?

I am trying to find a way to accomplish this without using a cursor.

I have a set of data that totals about 13 million records. The interval between one record and the next varies, but are all between 5 and 20 minutes. I need to create a new table of data, but select the data so that this is at least a 30 minute gap between one record and the next.

For example, if I have this:

VID | Datetime
1   |  2016-01-01 00:00
1   |  2016-01-01 00:10
1   |  2016-01-01 00:12
1   |  2016-01-01 00:25
2   |  2016-01-01 00:40
4   |  2016-01-01 01:00
4   |  2016-01-01 02:13
6   |  2016-01-01 02:23
7   |  2016-01-01 02:25
8   |  2016-01-01 02:49
9   |  2016-01-01 02:59
9   |  2016-01-01 03:01
9   |  2016-01-01 03:09
9   |  2016-01-01 03:24
9   |  2016-01-01 04:05

The new table would look like this:

VID | Datetime
1   |  2016-01-01 00:00
2   |  2016-01-01 00:40
4   |  2016-01-01 02:13
8   |  2016-01-01 02:49
9   |  2016-01-01 03:24

I can do this with a cursor, but for millions of records that's crazy. I've seen mentions of something called a quirky update for similar situations, but I'm not sure what that is.

Currently using SQL Server 2014. Any help would be greatly appreciated.

Upvotes: 2

Views: 185

Answers (2)

Richard Borgen Hansen
Richard Borgen Hansen

Reputation: 26

Things would be a lot easier if you had an identifier column instead of [vid]. If that was the case then you could do this:

with mycte (id, mydate, keepthis, offset)
as
(
    select
        id,
        mydate,
        1 keepthis,
        0 offset
    from mytable where id = 1
    union all
    select
        t.id,
        t.mydate,
        case when datediff(mi, o.mydate, t.mydate)+o.offset >= 30 then 1 else 0 end keepthis,
        case when datediff(mi, o.mydate, t.mydate)+o.offset >= 30 then 0 else datediff(mi, o.mydate, t.mydate)+o.offset end
    from mytable t join mycte o on t.id = o.id+1
)

select id,mydate from mycte where keepthis=1

Upvotes: 1

SMM
SMM

Reputation: 2235

Thinking a little outside the box, if it is permissible to just get the earliest row in each 30 min interval then I have a solution that works.

Caveats:

  • I assumed at this point your data might go back as far as 30 years. 13 million intervals approaches the limit of the way the tally table is configured so if you go over 16 mil you will need to make changes
  • You may need to break up the CTEs as temp tables and add indexes to get decent performance on that data :-)

--

--setup data
declare @t table (VID int, [Datetime] datetime);
insert  @t values
        (1, '1986-01-01 00:00'),    --very early year
        (1, '2016-01-01 00:10'),
        (1, '2016-01-01 00:12'),
        (1, '2016-01-01 00:25'),
        (2, '2016-01-01 00:40'),
        (4, '2016-01-01 01:00'),
        (4, '2016-01-01 02:13'),
        (6, '2016-01-01 02:23'),
        (7, '2016-01-01 02:25'),
        (8, '2016-01-01 02:49'),
        (9, '2016-01-01 02:59'),
        (9, '2016-01-01 03:01'),
        (9, '2016-01-01 03:09'),
        (9, '2016-01-01 03:24'),
        (9, '2016-01-01 04:05');
select  * from @t order by VID, [Datetime];
--select    datediff(MI, (select min([Datetime]) from @t), (select max([Datetime]) from @t));   --15778325 records in 30 years - handled by t4 x t4 x t4 in tally generator

-- Tally generator courtesy of http://www.sqlservercentral.com/blogs/never_say_never/2010/03/19/tally_2D00_table_2D00_cte/
-- Tally Table CTE script (SQL 2005+ only)
-- You can use this to create many different numbers of rows... for example:
-- You could use a 3 way cross join (t3 x, t3 y, t3 z) instead of just 2 way to generate a different number of rows.
-- The # of rows this would generate for each is noted in the X3 comment column below.
-- For most common usage, I find t3 or t4 to be enough, so that is what is coded here.
-- If you use t3 in ‘Tally’, you can delete t4 and t5.
; WITH
    -- Tally table Gen            Tally Rows:     X2                X3
t1 AS (SELECT 1 N UNION ALL SELECT 1 N),    -- 4            ,    8
t2 AS (SELECT 1 N FROM t1 x, t1 y),            -- 16            ,    64
t3 AS (SELECT 1 N FROM t2 x, t2 y),            -- 256            ,    4096
t4 AS (SELECT 1 N FROM t3 x, t3 y),            -- 65536        ,    16,777,216
t5 AS (SELECT 1 N FROM t4 x, t4 y),            -- 4,294,967,296,    A lot
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
          FROM t4 x, t4 y, t4 z), -- Change the t3's to one of the other numbers above for more/less rows
--generate time values
Intervals as (
        select  t.N - 1 interval,
                dateadd(mi, (t.N - 1) * 30, min_date.min_date) interval_start,
                dateadd(mi, (t.N) * 30, min_date.min_date) next_interval_start
        from    (
                select  min([Datetime]) min_date
                from    @t
                ) min_date
        join    Tally t
                on  t.N <= datediff(MI, (select min([Datetime]) from @t), (select max([Datetime]) from @t)) / 30 + 1
),
--join intervals to data tables
Intervaled_data as (
        select  *, row_number() over (partition by i.interval order by t.[Datetime]) row_num
        from    @t t
        join    Intervals i
                on  t.[Datetime] >= i.interval_start and t.[Datetime] < i.next_interval_start
)
select  i.VID, i.[Datetime]
from    Intervaled_data i
where   i.row_num = 1
order   by i.VID, i.[Datetime];

Upvotes: 1

Related Questions