Reputation: 650
I have a SQLite Database with the following structure:
rowid ID startTimestamp endTimestamp subject
1 00:50:c2:63:10:1a 1000 1090 entrance
2 00:50:c2:63:10:1a 1100 1270 entrance
3 00:50:c2:63:10:1a 1300 1310 door1
4 00:50:c2:63:10:1a 1370 1400 entrance
.
.
.
I have prepared a sqlfiddle here: http://sqlfiddle.com/#!2/fe8c6/2
With this SQL-Query i can get the average differences between the endTime and the startTime between one row and the following row, sorted by subject and ID:
SELECT
id,
( MAX(endtimestamp) - MIN(startTimestamp)
- SUM(endtimestamp-startTimestamp)
) / (COUNT(*)-1) AS averageDifference
FROM
table1
WHERE ID = '00:50:c2:63:10:1a'
AND subject = 'entrance'
GROUP BY id;
My problem: To calcute the average value is no problem, that does this query. But how can i get the standard deviation and the variance of this values?
Upvotes: 6
Views: 15370
Reputation: 2473
A number of points:
SUM(endtimestamp-starttimestamp)/COUNT(endtimestamp)
. I have no idea why you have the MIN/MAX
terms. COUNT(*)
will count NULL
rows and will give the wrong result.avg
function which finds the mean.SUM((endtimestamp-starttimestamp)*(endtimestamp-starttimestamp)) - AVG(endtimestamp-starttimestamp)*AVG(endtimestamp-starttimestamp)
In response to the question authors comment, in order to compute the variance the start and end times must be paired with each other through a self join.
Becuase of the absence of a row_number function in SQL lite this is a little inelegant.
SELECT id,
AVG(startTimestamp-endTimestamp) as mean,
SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2 as variance,
SQRT(SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2) as stDev
FROM
(SELECT
t1.id,
t1.endTimestamp,
MIN(t2.startTimestamp) as starttimestamp
FROM table1 t1
INNER JOIN
table1 t2 ON t1.endTimestamp<=t2.startTimestamp
GROUP BY t1.id, t1.endTimestamp) t
GROUP BY id;
See SQL Fiddle
Upvotes: 1
Reputation: 6315
First finding the time differences of interest by joining the table to itself and grouping by ID, then finding the averages, variances as V(x) = E(x^2) - (E(x))^2
and standard deviation as sqrt(V)
gives
SELECT ID, AVG(diff) AS average,
AVG(diff*diff) - AVG(diff)*AVG(diff) AS variance,
SQRT(AVG(diff*diff) - AVG(diff)*AVG(diff)) AS stdev
FROM
(SELECT t1.id, t1.endTimestamp,
min(t2.startTimeStamp) - t1.endTimestamp AS diff
FROM table1 t1
INNER JOIN table1 t2
ON t2.ID = t1.ID AND t2.subject = t1.subject
AND t2.startTimestamp > t1.startTimestamp -- consider only later startTimestamps
WHERE t1.subject = 'entrance'
GROUP BY t1.id, t1.endTimestamp) AS diffs
GROUP BY ID
Upvotes: 3
Reputation: 180162
For formulas that are more complex than simple summation, you have to compute the actual difference values for each record by lookin up the corresponding next start times, like this:
SELECT (SELECT MIN(startTimestamp)
FROM table1 AS next
WHERE next.startTimestamp > table1.startTimestamp
AND ID = '...'
) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
AND ID = '...'
Then you can use all the difference values to do the calculations:
SELECT SUM(timeDifference) / COUNT(*) AS average,
AVG(timeDifference) AS moreEfficientAverage,
SUM(timeDifference * timeDifference) / COUNT(*) -
AVG(timeDifference) * AVG(timeDifference) AS variance
FROM (SELECT (SELECT MIN(startTimestamp)
FROM table1 AS next
WHERE next.startTimestamp > table1.startTimestamp
AND next.ID = '...'
) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
AND ID = '...')
Upvotes: 3