Reputation: 2470
I have a column set (let's say column A and column B). Each column can have multiple points (datetime and value). Each column set have predefined timestep. For below example it's: 600sec.
columnA
dateTime - '2013-10-05 00:10:00'
Value - 0.51231
dateTime - '2013-10-05 00:20:00'
Value - 0.54123
dateTime - '2013-10-05 00:30:00'
Value - 0.53312
...
dateTime - '2013-11-14 13:30:00'
Value - 0.53312
etc.
columnB
dateTime - '2013-10-05 00:10:00'
Value - 0.51231
dateTime - '2013-10-05 00:20:00'
Value - 0.54123
dateTime - '2013-10-05 00:50:00'
Value - 0.54123
...
dateTime - '2013-11-14 13:20:00'
Value - 0.53312
dateTime - '2013-11-14 13:30:00'
Value - 0.53312
dateTime - '2013-11-14 13:40:00'
Value - 0.53312
I need to find missing data points for each column and insert NULL points so all columns would have exact amount of points.
For above examples missing data points are:
columnA - 2013-11-14 13:40:00
columnB - 2013-10-05 00:30:00, 2013-10-05 00:40:00
I know I can iterate using T-SQL or PHP (client side) for each column for given column set and check each point, but it would be very slow I think. Especially PHP solution.
There is possibly smart SQL that would be able to select missing dateTimes (gaps).
Any ideas?
UPDATE: Because it's not clear about the DB structure. Column is a field within the table, so, let's have a table: tblValues. There are 3 fields: columnName, datetime and value. In our examples columnName would be the columnA or columnB.
Upvotes: 0
Views: 569
Reputation: 32737
It's not too bad. The key here is to keep a numbers table around. It's just a table with one column and the numbers 1 through as many as you want in it. They're incredibly useful. Here's my favorite method to create one (courtesy of Itzik Ben-Gan):
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
Tally as (select row_number() over(order by C) as Number from Pass5)
select Number INTO dbo.Numbers from Tally where Number <= 1000000
Now that you've got that, it's just a matter of using the numbers table to generate where the data should be and then left join it to what you actually have. Something like this:
WITH [data] AS (SELECT * FROM (VALUES
('2013-10-05 00:10:00', 0.51231),
('2013-10-05 00:20:00', 0.54123),
('2013-10-05 00:30:00', 0.53312),
('2013-10-05 00:50:00', 0.54123)
) AS x(d, v)),
[times] AS (
SELECT TOP 10 DATEADD(MINUTE, 10*[Number], '2013-10-05') AS [d]
FROM dba.dbo.[Numbers] AS n
)
SELECT * FROM [times]
LEFT JOIN [data]
ON [times].d = [data].d
WHERE [data].d IS NULL
Upvotes: 2
Reputation: 22301
I am not quite certain what your table structure looks like, but assuming you are trying to find rows which have no successor within 10 minutes, you can do a self join:
-- Example schema
CREATE TABLE ExampleData (ValueDate datetime, Value int)
INSERT INTO ExampleData
VALUES ('2013-01-01 14:00:00', 1),
('2013-01-01 14:10:00', 1),
-- this row will be found as there is a gap
('2013-01-01 14:20:00', 1),
-- gap ends
('2013-01-01 14:40:00', 1)
-- Query
SELECT gs.ValueDate as [Start of Gap], ge.ValueDate as [End of Gap],
gs.Value as [Value at start of gap], ge.Value as [Value at end of gap]
FROM (
-- correlate all values with their successors
SELECT gl.ValueDate, gl.Value, MIN(gh.ValueDate) as NextValueDate
FROM ExampleData gl
LEFT OUTER JOIN ExampleData gh on gl.ValueDate < gh.ValueDate
GROUP BY gl.ValueDate, gl.Value
) as gs
LEFT OUTER JOIN ExampleData ge on gs.NextValueDate = ge.ValueDate
-- limit to only those values which have gaps of greater than 600 seconds
WHERE ge.ValueDate > DATEADD(second, 600, gs.ValueDate) OR ge.ValueDate is null
--Results:
--Start of Gap End of Gap Value at start of gap Value at end of gap
------------------------- ----------------------- --------------------- -------------------
--2013-01-01 14:20:00.000 2013-01-01 14:40:00.000 1 1
--2013-01-01 14:40:00.000 NULL 1 NULL
Upvotes: 1
Reputation: 3953
super confusing question, here's another thought...
you could generate a table of datetimevalues for the time range you care about (Via a script of your own or use SSAS to generate a time dimension table). Perfom a while loop over the datetimevalues table, inside the loop query your table for that time stamp, then make sure the results of that query contain all the columns you need. Write new rows for any missing columns. something like this might help
declare @temp table tempRes (columnname varchar(100))
declare @timestamptable table (ts datetime)
insert @timestamptable select ts from mygeneratedTSTable
while exists(select '' from @timestamptable)
begin
declare @ts datetime
set @ts = (select top 1 ts from ts);
--insert all the entries you have for the given timestamp
insert @temp select distinct columnname from yourdata where yourTScol = @ts
--now check to see if you are missing columns... if you have a lot of columns to check for
--then do this part smarter
if not exists (select '' from @temp where columnname = 'column a')
begin
insert into yourdata values('column a',null)
end
if not exists (select '' from @temp where columnname = 'column b')
begin
insert into yourdata values('column b',null)
end
--clear out the temp
delete from @temp
--remove this top element so the loop will terminate
delete top (1) from @timestamptable
end
hope this helps Also, rather than the IF blocks, you could have a list of column names and outer join that set to the results in the @temp table, and then insert any rows that have a null reference.
Upvotes: 0