Reputation: 10610
I have a table in which there are records about works accesing entrance doors.
DECLARE @doorStatistics TABLE
( id INT IDENTITY,
[user] VARCHAR(250),
accessDate DATETIME,
accessType VARCHAR(5)
)
Sample records:
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:12:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:22:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:32:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:37:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:42:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:48:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:52:43.000','OUT')
What I want to do is a query, which gives me following result (based on above example):
| user | date | inHour | outHour |
|--------------|------------|----------|----------|
| John Wayne | 2009-09-01 | 07:02:43 | 07:48:43 |
| Bruce Willis | 2009-09-01 | 07:12:43 | 07:22:43 |
| John Wayne | 2009-09-02 | 07:37:43 | 07:48:43 |
| Bruce Willis | 2009-09-02 | 07:42:43 | 07:52:43 |
Query I made is the following:
SELECT [user], accessDate AS [in date],
(SELECT MIN(accessDate)
FROM @doorStatistics ds2
WHERE accessType = 'OUT'
AND ds2.accessDate > ds.accessDate
AND ds.[user] = ds2.[user]) AS [out date]
FROM @doorStatistics ds
WHERE accessType = 'IN'
But it is not good, because when user forget to register his/her entrance it will produce for example something like this:
| user | date | inHour | outHour |
|--------------|------------|----------|----------|
| John Wayne | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne | 2009-09-02 | 07:02:43 | 09:26:43 |
While it should be
| user | date | inHour | outHour |
|--------------|------------|----------|----------|
| John Wayne | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne | 2009-09-02 | NULL | 09:26:43 |
The 2nd reason the query is not good is the performance. I have over 200 000 records and SELECT for every row slows down the query.
The possible solution may be to join two tables
SELECT * FROM @doorStatistics WHERE accessType = 'IN'
with
SELECT * FROM @doorStatistics WHERE accessType = 'OUT'
but I have no idea what conditions to put to get correct date. Maybe some MAX or MIN functions can be put there but I have no idea.
I don't want to create temporary table and use cursors.
Upvotes: 2
Views: 1739
Reputation: 76952
To improve performance on the structure level:
accessDate
column to accessDateTime
accessDateTime
(shown below). Then the index you need would include only the accessDate
column which you will use for exact comparison together with the user
accessDate
column definition:
accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED
Now, given that you have done it and you have SQL-2005+, this terribly long query should do the job:
WITH MatchIN (in_id, out_id)
AS (SELECT s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
FROM @doorStatistics s
LEFT JOIN @doorStatistics x
ON x.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'OUT'
AND z.accessDate = s.accessDate
AND z.accessDateTime >= s.accessDateTime
ORDER BY z.accessDateTime ASC
)
LEFT JOIN @doorStatistics y
ON y.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'IN'
AND z.accessDate = s.accessDate
AND z.accessDateTime >= s.accessDateTime
AND z.accessDateTime <= x.accessDateTime
ORDER BY z.accessDateTime DESC
)
WHERE s.accessType = 'IN'
)
, MatchOUT (out_id, in_id)
AS (SELECT s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
FROM @doorStatistics s
LEFT JOIN @doorStatistics x
ON x.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'IN'
AND z.accessDate = s.accessDate
AND z.accessDateTime <= s.accessDateTime
ORDER BY z.accessDateTime DESC
)
LEFT JOIN @doorStatistics y
ON y.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'OUT'
AND z.accessDate = s.accessDate
AND z.accessDateTime <= s.accessDateTime
AND z.accessDateTime >= x.accessDateTime
ORDER BY z.accessDateTime ASC
)
WHERE s.accessType = 'OUT'
)
SELECT COALESCE(i."user", o."user") AS "user",
COALESCE(i.accessDate, o.accessDate) AS "date",
CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM (SELECT in_id, out_id FROM MatchIN
UNION -- this will eliminate duplicates as the same time
SELECT in_id, out_id FROM MatchOUT
) x
LEFT JOIN @doorStatistics i
ON i.id = x.in_id
LEFT JOIN @doorStatistics o
ON o.id = x.out_id
ORDER BY "user", "date", "inHour"
To test for handling of missing rows, just comment out some of your INSERT statements of test data.
Upvotes: 1
Reputation: 562230
When designing databases for temporal events that have duration, it's better practice to put the "IN" time and the "OUT" time on the same row.
All the queries you need to do are so much easier that way.
See "Joe Celko's SQL Programming Style" where he talks about temporal cohesion on pages 48 and 154.
Upvotes: 1
Reputation: 753475
You need to select the minimal OUT record for each IN record for a given user, after ensuring there is no intervening IN record (which would correspond to someone getting IN twice without ever leaving the building). That requires some modestly tricky SQL (a NOT EXISTS clause, for example). So, you are going to have a self-join on the table, plus a NOT EXISTS sub-query on the same table. Just make sure you alias all references to the table sensibly.
Upvotes: 1