Reputation: 18564
I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?
Upvotes: 10
Views: 18592
Reputation: 11
OLD POST but ....
Easies way is to use the Lag function and TIMESTAMPDIFF
SELECT
id,
TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
SELECT
id,
TIMESTAMP,
LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
FROM TABLE_NAME
)
Upvotes: 1
Reputation: 449
Adapted for SQL Server from this discussion.
Essential columns used are: cmis_load_date: A date/time stamp associated with each record. extract_file: The full path to a file from which the record was loaded.
Comments: There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.
SELECT
AVG(DATEDIFF(day, t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber
Upvotes: 0
Reputation: 44173
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
Then we just condition the timestamps to be distinct.
Upvotes: 38
Reputation: 238096
Here's one way:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)
Upvotes: 1
Reputation: 44268
Are the ID's contiguous ?
You could do something like,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
Then you could wrap that up in an AVG grouping...
Upvotes: 2