MrLister
MrLister

Reputation: 707

How can I determine a time duration

NOTICE: this is advanced SQL... (IMHO)

Summary of what I am trying to do:

I have an application that puts data into the DB with a time stamp. What I would like to do is look at that data, in a chronological fashion, and determine the duration between a "start" and "stop" point

Here is UserRoadData table:

    CREATE TABLE [dbo].[UserRoadData](
        [Indx] [uniqueidentifier] NOT NULL,
        [Indx_User] [uniqueidentifier] NOT NULL,
        [Indx_RoadPoint] [uniqueidentifier] NOT NULL,
        [TimeHit] [datetime] NOT NULL,
        [Indx_UserRoadDataStatus] [int] NOT NULL,
     CONSTRAINT [PK_UserRoadData] PRIMARY KEY CLUSTERED 

Here is RoadPoints table:

    CREATE TABLE [dbo].[RoadPoints](
        [Indx] [uniqueidentifier] NOT NULL,
        [Indx_Road] [uniqueidentifier] NOT NULL,
        [Indx_PointType] [int] NOT NULL,
        [Sequence] [int] NOT NULL,
        [SegmentNumber] [int] NOT NULL,
        [GeoLoc] [geography] NOT NULL,
     CONSTRAINT [PK_RoadPoints] PRIMARY KEY CLUSTERED 

I have the following query that attempts to determine durations ( NOTE: @NotProcessed = 0 at this point )

        declare @Now datetime
        set @Now = GETDATE();

        with CTE_1 as( 
        SELECT
                Indx_RoadPoint
                ,Indx_Road
                ,SegmentNumber
                ,Indx_UserRoadDataStatus
                ,case WHEN Indx_PointType = @SegmentStart THEN TimeHit
                    ELSE NULL
                END AS InTime
                ,case WHEN Indx_PointType = @SegmentEnd THEN TimeHit
                    ELSE NULL
                END AS OutTime
                ,row_number() OVER ( ORDER BY TimeHit ASC ) AS rowNum
                FROM
                UserRoadData INNER JOIN RoadPoints ON UserRoadData.Indx_RoadPoint = RoadPoints.Indx 
                where 
                Indx_User = @Indx_User
                and
                Indx_Road = @Indx_Road
                and
                Indx_UserRoadDataStatus = @NotProcessed
                ),

          CTE_2 -- mark those that repeat
            AS ( SELECT
                t.Indx_RoadPoint
                ,t.Indx_Road
                ,t.SegmentNumber
                ,t.InTime
                ,t.OutTime
                ,t.rowNum
                ,case WHEN ( SELECT Indx_RoadPoint
                              FROM CTE_1 AS x
                              WHERE x.rowNum = t.rowNum - 1
                            ) = t.Indx_RoadPoint THEN 1
                       ELSE 0
                  END AS mrk
                 FROM CTE_1 AS t
               ),

          CTE_3 --extract non repeats and group
            AS ( SELECT
                  *
                 ,row_number() OVER ( PARTITION BY Indx_RoadPoint ORDER BY rowNum ASC ) AS rn2
                 FROM CTE_2
                 WHERE  mrk = 0
               )
      SELECT
        newid() as indx  -- this table needs to have the same columns as UserRideData
        ,t1.Indx_Road
        ,@Indx_User as Indx_User
        ,t1.SegmentNumber
        ,t1.InTime
        ,t2.OutTime
        ,datediff(ss, t1.InTime, t2.OutTime) AS Duration
        ,@Now as DateRecorded

      INTO #RoadAndRoadData  -- results are put into a temp table...

      FROM
        CTE_3 AS t1
        JOIN CTE_3 AS t2 ON t1.rn2 = t2.rn2
      WHERE
        t1.Intime IS NOT NULL
        AND t2.OutTime IS NOT NULL

      GROUP BY
        t1.Indx_Road
        ,t1.SegmentNumber
        ,t1.InTime
        ,t2.OutTime


 -- insert the temp table data into the real table
 INSERT INTO dbo.UserRideData SELECT * FROM #RoadAndRoadData

Here is the entire output for the CTE_1 statement, it contains the right data ! Guids (indx) have been shortened to protect the innocent :-)

Indx    Indx_Road   Segment Status  InTime  OutTime rowNum

87502C53    28992B99    0   0   NULL    NULL    1
17BB6691    28992B99    0   0   NULL    NULL    2
C40FD10F    28992B99    1   0   2014-06-11 09:09:11.200 NULL    3
7BC5D0A6    28992B99    1   0   NULL    NULL    4
97BA8F20    28992B99    1   0   NULL    NULL    5
75A3F916    28992B99    1   0   NULL    NULL    6
FA2B73E5    28992B99    1   0   NULL    NULL    7
D1E16249    28992B99    1   0   NULL    NULL    8
BAB45A3C    28992B99    1   0   NULL    NULL    9
0EC3D9AD    28992B99    1   0   NULL    NULL    10
3A0BAF2A    28992B99    1   0   NULL    NULL    11
5B97F78A    28992B99    1   0   NULL    2014-06-11 09:09:20.200 12
E55C20C5    28992B99    2   0   2014-06-11 09:09:21.200 NULL    13
FBC14E4E    28992B99    2   0   NULL    NULL    14
5396D1FF    28992B99    2   0   NULL    NULL    15
63D5F64B    28992B99    2   0   NULL    NULL    16
A463F4FA    28992B99    2   0   NULL    2014-06-11 09:09:25.200 17
F6A528D8    28992B99    0   0   NULL    NULL    18
1D73335D    28992B99    0   0   NULL    NULL    19

As you can see there is a start time, then an end time for each unique Segment eg:

 Segment 1 indx C40FD10F has a non null start time
 Segment 1 indx 5B97F78A has a non null stop time  --- ( PAIR 1 )
 Segment 2 indx E55C20C5 has a non null start time
 Segment 2 indx A463F4FA has a non null stop time  --- ( PAIR 2 )

This is the real important data from the above output and this question really is focused around getting only the duration into a table from PAIR 1 and PAIR 2

Indx    Indx_Road   Segment Status  InTime  OutTime rowNum

C40FD10F    28992B99    1   0   2014-06-11 09:09:11.200 NULL    3
5B97F78A    28992B99    1   0   NULL    2014-06-11 09:09:20.200 12   --- ( PAIR 1 )

E55C20C5    28992B99    2   0   2014-06-11 09:09:21.200 NULL    13
A463F4FA    28992B99    2   0   NULL    2014-06-11 09:09:25.200 17   --- ( PAIR 2 )

When the stored proc runs here are the results. Notice Segment 1 and 2 have two durations each... one of them is a false positive

indx    Indx_Road   Indx_User   SegmentNumber   InTime  OutTime Duration    DateRecorded
382A9F0D    28992B99    22222222    1   2014-06-11 09:09:11.200 2014-06-11 09:09:20.200 9   2014-06-11 09:09:28.207
BC942182    28992B99    22222222    1   2014-06-11 09:09:11.200 2014-06-11 09:09:25.200 14  2014-06-11 09:09:28.207
548A0340    28992B99    22222222    2   2014-06-11 09:09:21.200 2014-06-11 09:09:20.200 -1  2014-06-11 09:09:28.207
E8322022    28992B99    22222222    2   2014-06-11 09:09:21.200 2014-06-11 09:09:25.200 4   2014-06-11 09:09:28.207

I would like there to be only one duration, like this ( from PAIR 1 and PAIR 2 above )

indx    Indx_Road   Indx_User   SegmentNumber   InTime  OutTime Duration    DateRecorded
BC942182    28992B99    22222222    1   2014-06-11 09:09:11.200 2014-06-11 09:09:25.200 14  2014-06-11 09:09:28.207
E8322022    28992B99    22222222    2   2014-06-11 09:09:21.200 2014-06-11 09:09:25.200 4   2014-06-11 09:09:28.207

If you can provide any insights I would be grateful !

Thanks...

Upvotes: 0

Views: 104

Answers (2)

Kevin Cook
Kevin Cook

Reputation: 1932

SELECT 
    CTE_2.Indx,
    CTE_2.Indx_Road,
    CTE_2.Indx_User,
      CTE_2.SegmentNumber,
    CTE_2.InTime,
    CTE_2.OutTime,
    DATEDIFF(SECOND, CTE_2.InTime, CTE_2.OutTime) AS Duration,
    GETDATE() AS DateRecorded
FROM
(
    SELECT 
        newid() AS Indx, 
        CTE_1.Indx_Road, 
        @Indx_User AS Indx_User,
                CTE_1.SegmentNumber,
        MAX(CTE_1.InTime) AS InTime,
        MAX(CTE_1.OutTime) AS OutTime
    FROM
    (
        SELECT
            urd.Indx_RoadPoint,
            rp.Indx_Road,
            rp.SegmentNumber,
            urd.Indx_UserRoadDataStatus,
            (CASE WHEN Indx_PointType = @SegmentStart THEN TimeHit ELSE NULL END) AS InTime,
            (CASE WHEN Indx_PointType = @SegmentEnd THEN TimeHit ELSE NULL END) AS OutTime,
            (row_number() OVER ( ORDER BY TimeHit ASC )) AS rowNum
        FROM UserRoadData urd
        INNER JOIN RoadPoints rp
            ON urd.Indx_RoadPoint = rp.Indx 
            AND (rp.Indx_PointType = @SegmentStart OR rp.Indx_PointType = @SegmentEnd)
        WHERE urd.Indx_User = @Indx_User
            and rp.Indx_Road = @Indx_Road
            and urd.Indx_UserRoadDataStatus = @NotProcessed
    ) AS CTE_1
    GROUP BY CTE_1.Indx_Road, CTE_1.SegmentNumber
) AS CTE_2

Upvotes: 0

Clockwork-Muse
Clockwork-Muse

Reputation: 13106

It's difficult to tell, given no sample starting data, but I agree with @Kevin - your query can be simplified from the start. In particular, your query is doing massive amounts of work it doesn't need to, that it often throws away.

This query makes the following assumptions:

  1. All segments have both an In and an Out time, and only one of each.
  2. The Out time is always later than the In time.
  3. SegmentNumber is correct for all in/out pairs (although this can be generated if necessary).

You should be able to use something close to the following:

SELECT NEWID() AS Indx,
       @Indx_Road AS Indx_Road, @Indx_User AS Indx_User,
       inTime, outTime, DATEDIFF(second, inTime, outTime) AS duration,
       GETDATE() AS dateRecorded
FROM (SELECT Road.SegmentNumber,
             MIN(UserRoad.timeHit) AS inTime, MAX(UserRoad.timeHit) AS outTime
      FROM UserRoadData UserRoad
      JOIN RoadPoints Road
        ON Road.Indx = UserRoad.Indx_RoadPoint
           AND Road.Indx_Road = @Indx_Road
           AND Road.Indx_PointType IN (@SegmentStart, @SegmentEnd)
      WHERE UserRoad.Indx_User = @Indx_User
            AND UserRoad.Indx_UserRoadDataStatus = @NotProcessed
      GROUP BY Road.SegmentNumber) SegmentTime

I doubt that you'll be able to use an index to answer the GROUP BY directly, although the rest of the clauses should limit your starting set rather severely. You were simply throwing out most rows returned from CTE_1, I'm not even bothering to include them.
I don't know about the performance implications in your case, but you should likely be able to insert this straight into the destination table, without messing with the intermediate temp-table.
Note that this query was also written with the assumption that you have few, run-on-demand, instances of your input parameters. If you're running this over bulk entries, the query should change.

Upvotes: 1

Related Questions