joe_coolish
joe_coolish

Reputation: 7259

SQL close close Gaps in data over time

I have a table of play data that I'm using for a prototype. I'm generating the data while I'm at work, but when I leave and my machine goes to sleep, the data generation stops. This has cause large gaps in my collection of items.

I would like to be able to shift the values of each item in the DateTimeCreated collumn of my table so that there isn't a gap of more than 10 minutes between any item and the next generated item.

The structure of the table is like this:

CREATE TABLE [dbo].[Items](
    [Id] [uniqueidentifier] NOT NULL,
    [DateTimeCreated] [datetimeoffset](7) NOT NULL,
    [AuthorId] [uniqueidentifier] NOT NULL,
    [Source] [varchar](max) NOT NULL,       
    [FullText] [varchar](max) NOT NULL,
 CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

I was thinking about doing this in L2S, but I have over 1 million records, so IDK if that is the best solution (iterating over each item). I know there has to be some way to do this in SQL that will be much faster.

Upvotes: 3

Views: 293

Answers (2)

J Cooper
J Cooper

Reputation: 4998

An alternative Ranking-Functions Approach (not 100% tested):

DECLARE @tenMinutes AS INT = 600;


WITH StartingPoints AS
(
    SELECT DateTimeCreated, ROW_NUMBER() OVER(ORDER BY DateTimeCreated) AS rownum
    FROM dbo.Items AS A
    WHERE NOT EXISTS(
        SELECT * FROM dbo.Items AS B
        WHERE B.DateTimeCreated < A.DateTimeCreated 
          AND DATEDIFF(SECOND,B.DateTimeCreated, A.DateTimeCreated) BETWEEN 0 AND @tenMinutes
    )
),
EndingPoints AS
(
    SELECT DateTimeCreated, ROW_NUMBER() OVER(ORDER BY DateTimeCreated) AS rownum
    FROM dbo.Items AS A
    WHERE NOT EXISTS(
        SELECT * FROM dbo.Items AS B
        WHERE A.DateTimeCreated < B.DateTimeCreated 
          AND DATEDIFF(SECOND,A.DateTimeCreated, B.DateTimeCreated) BETWEEN 0 AND @tenMinutes
    )
),
Islands AS
(
    SELECT S.DateTimeCreated AS start_range,
           E.DateTimeCreated AS end_range,
           ROW_NUMBER() OVER(ORDER BY S.DateTimeCreated) AS row_num
    FROM StartingPoints AS S
    JOIN EndingPoints AS E on E.rownum = S.rownum
),
Ofs AS
(
    SELECT I2.start_range, 
           I2.end_range,  
           I1.end_range AS prev,
           DATEDIFF(SECOND, I1.end_range, I2.start_range) AS offset 
    FROM Islands AS I1
    JOIN Islands AS I2 ON I2.row_num = I1.row_num + 1 OR I2.row_num IS NULL
),
CmlOfs AS
(
    SELECT O1.start_range,
           O1.end_range,
           O1.prev,
           O1.offset,
           (SELECT SUM(O2.offset) FROM Ofs AS O2
            WHERE O2.start_range <= O1.start_range) AS cum_offset
    FROM Ofs AS O1
),
UpdateQ AS
(
    SELECT Items.*, DATEADD(SECOND, -1 * CmlOfs.cum_offset, Items.DateTimeCreated) AS new_value
    FROM Items
    JOIN CmlOfs ON Items.DateTimeCreated BETWEEN CmlOfs.start_range AND CmlOfs.end_range
)
UPDATE UpdateQ
SET DateTimeCreated = new_value;

Upvotes: 1

MatBailie
MatBailie

Reputation: 86715

Make sure to have an index on DateTimeCreated if you want this to be anything other than a pig.

It also assumes (as you said in your comment) there are few gaps compared to total number of records.

WITH
  gap (Start,Finish)
AS
(
  SELECT
    DateTimeCreated,
    (SELECT MIN(DateTimeCreated) FROM items AS lookup WHERE DateTimeCreated > DateTimeCreated)
  FROM
    items
  WHERE
    DATEADD(second, 600, DateTimeCreated) < (SELECT MIN(DateTimeCreated) FROM items AS lookup WHERE DateTimeCreated > DateTimeCreated)

  UNION ALL

  SELECT
    MAX(DateTimeCreated),
    MAX(DateTimeCreated)
  FROM
    items
)
,
  offset (Start,Finish,Offset)
AS
(
  SELECT
    [current].Start,
    (SELECT MIN(Start) FROM gap WHERE Start > [current].Start),
    DATEDIFF(second, Start, Finish) - 600
  FROM
    gap      AS [current]
)
,
  cumulative_offset (Start,Finish,Offset)
AS
(
  SELECT
    [current].Start,
    [current].Finish,
    SUM([cumulative].Offset)
  FROM
    offset    AS [current]
  INNER JOIN
    offset    AS [cumulative]
      ON [cumulative].Start <= [current].Start
)

UPDATE
  items
FROM
  cumulative_offset
SET
  DateTimeCreated = DATEADD(second, -Offset, DateTimeCreated)
INNER JOIN
  items
    ON  items.DateTimeCreated >  cumulative.Start
    AND items.DateTimeCreated <= cumulative.Finish

Upvotes: 1

Related Questions