Tom Raganowicz
Tom Raganowicz

Reputation: 2470

How to find gaps of data and insert NULL data points instead having gap

I have a column set (let's say column A and column B). Each column can have multiple points (datetime and value). Each column set have predefined timestep. For below example it's: 600sec.

columnA

dateTime - '2013-10-05 00:10:00' 
Value - 0.51231
dateTime - '2013-10-05 00:20:00' 
Value - 0.54123
dateTime - '2013-10-05 00:30:00' 
Value - 0.53312
...
dateTime - '2013-11-14 13:30:00' 
Value - 0.53312

etc.

columnB

dateTime - '2013-10-05 00:10:00' 
Value - 0.51231
dateTime - '2013-10-05 00:20:00' 
Value - 0.54123
dateTime - '2013-10-05 00:50:00' 
Value - 0.54123
...
dateTime - '2013-11-14 13:20:00' 
    Value - 0.53312
dateTime - '2013-11-14 13:30:00' 
Value - 0.53312
dateTime - '2013-11-14 13:40:00' 
Value - 0.53312

I need to find missing data points for each column and insert NULL points so all columns would have exact amount of points.

For above examples missing data points are:

columnA - 2013-11-14 13:40:00
columnB - 2013-10-05 00:30:00, 2013-10-05 00:40:00

I know I can iterate using T-SQL or PHP (client side) for each column for given column set and check each point, but it would be very slow I think. Especially PHP solution.

There is possibly smart SQL that would be able to select missing dateTimes (gaps).

Any ideas?

UPDATE: Because it's not clear about the DB structure. Column is a field within the table, so, let's have a table: tblValues. There are 3 fields: columnName, datetime and value. In our examples columnName would be the columnA or columnB.

Upvotes: 0

Views: 569

Answers (3)

Ben Thul
Ben Thul

Reputation: 32737

It's not too bad. The key here is to keep a numbers table around. It's just a table with one column and the numbers 1 through as many as you want in it. They're incredibly useful. Here's my favorite method to create one (courtesy of Itzik Ben-Gan):

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number INTO dbo.Numbers from Tally where Number <= 1000000

Now that you've got that, it's just a matter of using the numbers table to generate where the data should be and then left join it to what you actually have. Something like this:

WITH [data] AS (SELECT * FROM (VALUES 
    ('2013-10-05 00:10:00', 0.51231),
    ('2013-10-05 00:20:00', 0.54123),
    ('2013-10-05 00:30:00', 0.53312),
    ('2013-10-05 00:50:00', 0.54123)
) AS x(d, v)),
[times] AS (
    SELECT TOP 10 DATEADD(MINUTE, 10*[Number], '2013-10-05') AS [d]
    FROM dba.dbo.[Numbers] AS n
)
SELECT * FROM [times] 
LEFT JOIN [data]
    ON [times].d = [data].d
WHERE [data].d IS NULL

Upvotes: 2

Mitch
Mitch

Reputation: 22301

I am not quite certain what your table structure looks like, but assuming you are trying to find rows which have no successor within 10 minutes, you can do a self join:

-- Example schema
CREATE TABLE ExampleData (ValueDate datetime, Value int)
INSERT INTO ExampleData
VALUES ('2013-01-01 14:00:00', 1),
    ('2013-01-01 14:10:00', 1),
-- this row will be found as there is a gap
    ('2013-01-01 14:20:00', 1),
-- gap ends
    ('2013-01-01 14:40:00', 1)

-- Query    
SELECT gs.ValueDate as [Start of Gap], ge.ValueDate as [End of Gap], 
    gs.Value as [Value at start of gap], ge.Value as [Value at end of gap]
FROM (
    -- correlate all values with their successors
    SELECT gl.ValueDate, gl.Value, MIN(gh.ValueDate) as NextValueDate
    FROM ExampleData gl
    LEFT OUTER JOIN ExampleData gh on gl.ValueDate < gh.ValueDate
    GROUP BY gl.ValueDate, gl.Value
) as gs
LEFT OUTER JOIN ExampleData ge on gs.NextValueDate = ge.ValueDate
-- limit to only those values which have gaps of greater than 600 seconds
WHERE ge.ValueDate > DATEADD(second, 600, gs.ValueDate) OR ge.ValueDate is null

--Results:
--Start of Gap            End of Gap              Value at start of gap Value at end of gap
------------------------- ----------------------- --------------------- -------------------
--2013-01-01 14:20:00.000 2013-01-01 14:40:00.000 1                     1
--2013-01-01 14:40:00.000 NULL                    1                     NULL

Upvotes: 1

Mark Giaconia
Mark Giaconia

Reputation: 3953

super confusing question, here's another thought...

you could generate a table of datetimevalues for the time range you care about (Via a script of your own or use SSAS to generate a time dimension table). Perfom a while loop over the datetimevalues table, inside the loop query your table for that time stamp, then make sure the results of that query contain all the columns you need. Write new rows for any missing columns. something like this might help

    declare @temp table tempRes (columnname varchar(100))
    declare @timestamptable table (ts datetime)
insert @timestamptable select ts from mygeneratedTSTable
    while exists(select '' from @timestamptable)
    begin
        declare @ts datetime
        set @ts = (select top 1 ts from ts);
    --insert all the entries you have for the given timestamp
        insert @temp select distinct columnname from yourdata where yourTScol = @ts
    --now check to see if you are missing columns... if you have a lot of columns to check for
    --then do this part smarter
            if not exists (select '' from @temp where columnname = 'column a')
            begin
                insert into yourdata values('column a',null)
            end
            if not exists (select '' from @temp where columnname = 'column b')
            begin
                insert into yourdata values('column b',null)
            end
            --clear out the temp
        delete from @temp
        --remove this top element so the loop will terminate
        delete top (1) from @timestamptable
    end

hope this helps Also, rather than the IF blocks, you could have a list of column names and outer join that set to the results in the @temp table, and then insert any rows that have a null reference.

Upvotes: 0

Related Questions