Reputation: 707
NOTICE: this is advanced SQL... (IMHO)
Summary of what I am trying to do:
I have an application that puts data into the DB with a time stamp. What I would like to do is look at that data, in a chronological fashion, and determine the duration between a "start" and "stop" point
Here is UserRoadData table:
CREATE TABLE [dbo].[UserRoadData](
[Indx] [uniqueidentifier] NOT NULL,
[Indx_User] [uniqueidentifier] NOT NULL,
[Indx_RoadPoint] [uniqueidentifier] NOT NULL,
[TimeHit] [datetime] NOT NULL,
[Indx_UserRoadDataStatus] [int] NOT NULL,
CONSTRAINT [PK_UserRoadData] PRIMARY KEY CLUSTERED
Here is RoadPoints table:
CREATE TABLE [dbo].[RoadPoints](
[Indx] [uniqueidentifier] NOT NULL,
[Indx_Road] [uniqueidentifier] NOT NULL,
[Indx_PointType] [int] NOT NULL,
[Sequence] [int] NOT NULL,
[SegmentNumber] [int] NOT NULL,
[GeoLoc] [geography] NOT NULL,
CONSTRAINT [PK_RoadPoints] PRIMARY KEY CLUSTERED
I have the following query that attempts to determine durations ( NOTE: @NotProcessed = 0 at this point )
declare @Now datetime
set @Now = GETDATE();
with CTE_1 as(
SELECT
Indx_RoadPoint
,Indx_Road
,SegmentNumber
,Indx_UserRoadDataStatus
,case WHEN Indx_PointType = @SegmentStart THEN TimeHit
ELSE NULL
END AS InTime
,case WHEN Indx_PointType = @SegmentEnd THEN TimeHit
ELSE NULL
END AS OutTime
,row_number() OVER ( ORDER BY TimeHit ASC ) AS rowNum
FROM
UserRoadData INNER JOIN RoadPoints ON UserRoadData.Indx_RoadPoint = RoadPoints.Indx
where
Indx_User = @Indx_User
and
Indx_Road = @Indx_Road
and
Indx_UserRoadDataStatus = @NotProcessed
),
CTE_2 -- mark those that repeat
AS ( SELECT
t.Indx_RoadPoint
,t.Indx_Road
,t.SegmentNumber
,t.InTime
,t.OutTime
,t.rowNum
,case WHEN ( SELECT Indx_RoadPoint
FROM CTE_1 AS x
WHERE x.rowNum = t.rowNum - 1
) = t.Indx_RoadPoint THEN 1
ELSE 0
END AS mrk
FROM CTE_1 AS t
),
CTE_3 --extract non repeats and group
AS ( SELECT
*
,row_number() OVER ( PARTITION BY Indx_RoadPoint ORDER BY rowNum ASC ) AS rn2
FROM CTE_2
WHERE mrk = 0
)
SELECT
newid() as indx -- this table needs to have the same columns as UserRideData
,t1.Indx_Road
,@Indx_User as Indx_User
,t1.SegmentNumber
,t1.InTime
,t2.OutTime
,datediff(ss, t1.InTime, t2.OutTime) AS Duration
,@Now as DateRecorded
INTO #RoadAndRoadData -- results are put into a temp table...
FROM
CTE_3 AS t1
JOIN CTE_3 AS t2 ON t1.rn2 = t2.rn2
WHERE
t1.Intime IS NOT NULL
AND t2.OutTime IS NOT NULL
GROUP BY
t1.Indx_Road
,t1.SegmentNumber
,t1.InTime
,t2.OutTime
-- insert the temp table data into the real table
INSERT INTO dbo.UserRideData SELECT * FROM #RoadAndRoadData
Here is the entire output for the CTE_1 statement, it contains the right data ! Guids (indx) have been shortened to protect the innocent :-)
Indx Indx_Road Segment Status InTime OutTime rowNum
87502C53 28992B99 0 0 NULL NULL 1
17BB6691 28992B99 0 0 NULL NULL 2
C40FD10F 28992B99 1 0 2014-06-11 09:09:11.200 NULL 3
7BC5D0A6 28992B99 1 0 NULL NULL 4
97BA8F20 28992B99 1 0 NULL NULL 5
75A3F916 28992B99 1 0 NULL NULL 6
FA2B73E5 28992B99 1 0 NULL NULL 7
D1E16249 28992B99 1 0 NULL NULL 8
BAB45A3C 28992B99 1 0 NULL NULL 9
0EC3D9AD 28992B99 1 0 NULL NULL 10
3A0BAF2A 28992B99 1 0 NULL NULL 11
5B97F78A 28992B99 1 0 NULL 2014-06-11 09:09:20.200 12
E55C20C5 28992B99 2 0 2014-06-11 09:09:21.200 NULL 13
FBC14E4E 28992B99 2 0 NULL NULL 14
5396D1FF 28992B99 2 0 NULL NULL 15
63D5F64B 28992B99 2 0 NULL NULL 16
A463F4FA 28992B99 2 0 NULL 2014-06-11 09:09:25.200 17
F6A528D8 28992B99 0 0 NULL NULL 18
1D73335D 28992B99 0 0 NULL NULL 19
As you can see there is a start time, then an end time for each unique Segment eg:
Segment 1 indx C40FD10F has a non null start time
Segment 1 indx 5B97F78A has a non null stop time --- ( PAIR 1 )
Segment 2 indx E55C20C5 has a non null start time
Segment 2 indx A463F4FA has a non null stop time --- ( PAIR 2 )
This is the real important data from the above output and this question really is focused around getting only the duration into a table from PAIR 1 and PAIR 2
Indx Indx_Road Segment Status InTime OutTime rowNum
C40FD10F 28992B99 1 0 2014-06-11 09:09:11.200 NULL 3
5B97F78A 28992B99 1 0 NULL 2014-06-11 09:09:20.200 12 --- ( PAIR 1 )
E55C20C5 28992B99 2 0 2014-06-11 09:09:21.200 NULL 13
A463F4FA 28992B99 2 0 NULL 2014-06-11 09:09:25.200 17 --- ( PAIR 2 )
When the stored proc runs here are the results. Notice Segment 1 and 2 have two durations each... one of them is a false positive
indx Indx_Road Indx_User SegmentNumber InTime OutTime Duration DateRecorded
382A9F0D 28992B99 22222222 1 2014-06-11 09:09:11.200 2014-06-11 09:09:20.200 9 2014-06-11 09:09:28.207
BC942182 28992B99 22222222 1 2014-06-11 09:09:11.200 2014-06-11 09:09:25.200 14 2014-06-11 09:09:28.207
548A0340 28992B99 22222222 2 2014-06-11 09:09:21.200 2014-06-11 09:09:20.200 -1 2014-06-11 09:09:28.207
E8322022 28992B99 22222222 2 2014-06-11 09:09:21.200 2014-06-11 09:09:25.200 4 2014-06-11 09:09:28.207
I would like there to be only one duration, like this ( from PAIR 1 and PAIR 2 above )
indx Indx_Road Indx_User SegmentNumber InTime OutTime Duration DateRecorded
BC942182 28992B99 22222222 1 2014-06-11 09:09:11.200 2014-06-11 09:09:25.200 14 2014-06-11 09:09:28.207
E8322022 28992B99 22222222 2 2014-06-11 09:09:21.200 2014-06-11 09:09:25.200 4 2014-06-11 09:09:28.207
If you can provide any insights I would be grateful !
Thanks...
Upvotes: 0
Views: 104
Reputation: 1932
SELECT
CTE_2.Indx,
CTE_2.Indx_Road,
CTE_2.Indx_User,
CTE_2.SegmentNumber,
CTE_2.InTime,
CTE_2.OutTime,
DATEDIFF(SECOND, CTE_2.InTime, CTE_2.OutTime) AS Duration,
GETDATE() AS DateRecorded
FROM
(
SELECT
newid() AS Indx,
CTE_1.Indx_Road,
@Indx_User AS Indx_User,
CTE_1.SegmentNumber,
MAX(CTE_1.InTime) AS InTime,
MAX(CTE_1.OutTime) AS OutTime
FROM
(
SELECT
urd.Indx_RoadPoint,
rp.Indx_Road,
rp.SegmentNumber,
urd.Indx_UserRoadDataStatus,
(CASE WHEN Indx_PointType = @SegmentStart THEN TimeHit ELSE NULL END) AS InTime,
(CASE WHEN Indx_PointType = @SegmentEnd THEN TimeHit ELSE NULL END) AS OutTime,
(row_number() OVER ( ORDER BY TimeHit ASC )) AS rowNum
FROM UserRoadData urd
INNER JOIN RoadPoints rp
ON urd.Indx_RoadPoint = rp.Indx
AND (rp.Indx_PointType = @SegmentStart OR rp.Indx_PointType = @SegmentEnd)
WHERE urd.Indx_User = @Indx_User
and rp.Indx_Road = @Indx_Road
and urd.Indx_UserRoadDataStatus = @NotProcessed
) AS CTE_1
GROUP BY CTE_1.Indx_Road, CTE_1.SegmentNumber
) AS CTE_2
Upvotes: 0
Reputation: 13106
It's difficult to tell, given no sample starting data, but I agree with @Kevin - your query can be simplified from the start. In particular, your query is doing massive amounts of work it doesn't need to, that it often throws away.
This query makes the following assumptions:
In
and an Out
time, and only one of each. Out
time is always later than the In
time.SegmentNumber
is correct for all in/out pairs (although this can be generated if necessary).You should be able to use something close to the following:
SELECT NEWID() AS Indx,
@Indx_Road AS Indx_Road, @Indx_User AS Indx_User,
inTime, outTime, DATEDIFF(second, inTime, outTime) AS duration,
GETDATE() AS dateRecorded
FROM (SELECT Road.SegmentNumber,
MIN(UserRoad.timeHit) AS inTime, MAX(UserRoad.timeHit) AS outTime
FROM UserRoadData UserRoad
JOIN RoadPoints Road
ON Road.Indx = UserRoad.Indx_RoadPoint
AND Road.Indx_Road = @Indx_Road
AND Road.Indx_PointType IN (@SegmentStart, @SegmentEnd)
WHERE UserRoad.Indx_User = @Indx_User
AND UserRoad.Indx_UserRoadDataStatus = @NotProcessed
GROUP BY Road.SegmentNumber) SegmentTime
I doubt that you'll be able to use an index to answer the GROUP BY
directly, although the rest of the clauses should limit your starting set rather severely. You were simply throwing out most rows returned from CTE_1
, I'm not even bothering to include them.
I don't know about the performance implications in your case, but you should likely be able to insert this straight into the destination table, without messing with the intermediate temp-table.
Note that this query was also written with the assumption that you have few, run-on-demand, instances of your input parameters. If you're running this over bulk entries, the query should change.
Upvotes: 1