Reputation: 4190
So I have two tables. One tracking a a persons location, and one that has the shifts of staff members.
Staff members have a staffId, location, start and end times, and cost of that shift.
People have an eventId, stayId, personId, location, start and end time. A person will have an event with multiple stays.
What I am attempting to do is mesh these two tables together, so that I can accurately report the cost of each location stay, based on the duration of that stay multiplied by the associated cost of staff covering that location at that time.
The issues I have are:
My current method is to expand both tables to have a record for every single minute. So a stay that is between 1pm and 2pm will have 60 records, and a staff shift that goes for 5 hours will have 300 records. I can then take all staff that are working on that location at that minute to get a minute value based on the cost of each staff member divided by the duration of their shift, and apply that value to the corresponding record in the other table.
Techniques used:
I'm finding this process extremely slow as you can imagine, since my person table has about 500 million records when expanded to the minute level, and the staff table has about 35 million when the same thing is done.
Can people suggest a better method for me to use?
Sample data: Locations
| EventId | ID | Person | Loc | Start | End
| 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00
| 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00
| 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00
| 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00
Staff
| Loc | StaffID | Cost | Start | End
| 1 | 99 | 40 | May, 20 2015 04:00:00 | May, 20 2015 12:00:00
| 1 | 15 | 85 | May, 20 2015 03:00:00 | May, 20 2015 5:00:00
| 3 | 85 | 74 | May, 20 2015 18:00:00 | May, 20 2015 20:00:00
| 4 | 10 | 36 | May, 20 2015 06:00:00 | May, 20 2015 14:00:00
Result
| EventId | ID | Person | Loc | Start | End | Cost
| 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 | 45.50
| 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 | 81.20
| 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 | 95.00
| 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00 | 14.75
SQL: Numbers table
;WITH x AS
(
SELECT TOP (224) object_id FROM sys.all_objects
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.object_id)
INTO #numbers
FROM x CROSS JOIN x AS y
ORDER BY n
Staff Table
SELECT
Location,
ISNULL(SUM(ROUND(Cost/ CASE WHEN (DateDiff(MINUTE, StartDateTime, EndDateTime)) = 0 THEN 1 ELSE (DateDiff(MINUTE, StartDateTime, EndDateTime)) END, 5)),0) AS MinuteCost,
Count(Name) AS StaffCount,
RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, StartDateTime) + n.n -1, 0)
INTO #temp_StaffRoster
FROM dbo.StaffRoster
Grouping together, and where help is needed I think
INSERT INTO dbo.FinalTable
SELECT [EventId]
,[Id]
,[Start]
,[End]
,event.[Location]
,SUM(ISNULL(MinuteCost,1)/ISNULL(PeopleCount, 1)) AS Cost
,AVG(ISNULL(StaffCount,1)) AS AvgStaff
FROM dbo.Events event WITH (NOLOCK)
INNER JOIN #numbers n ON n.n BETWEEN 0 AND DATEDIFF(MINUTE, Start, End)
LEFT OUTER JOIN #temp_StaffRoster staff WITH (NOLOCK) ON staff.Location= event.Location AND staff.RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
LEFT OUTER JOIN (SELECT [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0) AS Mins, COUNT(Id) as PeopleCount
FROM dbo.Events WITH (NOLOCK)
INNER JOIN #numbers n ON n.n BETWEEN 0 AND DATEDIFF(MINUTE, Start, End)
GROUP BY [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
) cap ON cap.Location= event.LocationAND cap.Mins = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
GROUP BY [EventId]
,[Id]
,[Start]
,[End]
,event.[Location]
UPDATE
So I have two tables. One tracking a a persons location, and one that has the shifts of staff members with their cost. I am attempting to consolidate the two tables to calculate the cost of each location stay.
Here is my method:
;;WITH stay AS
(
SELECT TOP 650000
StayId,
Location,
Start,
End
FROM stg_Stay
WHERE Loction IS NOT NULL -- Some locations don't currently have a matching shift location
ORDER BY Location, ADTM
),
shift AS
(
SELECT TOP 36000000
Location,
ShiftMinute,
MinuteCost,
StaffCount
FROM stg_Shifts
ORDER BY Location, ShiftMinute
)
SELECT
[StayId],
SUM(MinuteCost) AS Cost,
AVG(StaffCount) AS StaffCount
INTO newTable
FROM stay S
CROSS APPLY (SELECT MinuteCost, StaffCount
FROM shift R
WHERE R.Location = S.Location
AND R.ShiftMinute BETWEEN S.Start AND S.End
) AS Shifts
GROUP BY [StayId]
This is where I'm at.
I've split the Shifts table into a minute by minute level since there is no clear alignment of shifts to stays.
stg_Stay contains more columns than needed for this operation. stg_Shift is as shown.
Indexes used on stg_Shifts:
CREATE NONCLUSTERED INDEX IX_Shifts_Loc_Min
ON dbo.stg_Shifts (Location, ShiftMinute)
INCLUDE (MinuteCost, StaffCount);
on stg_Stay
CREATE INDEX IX_Stay_StayId ON dbo.stg_Stay (StayId);
CREATE CLUSTERED INDEX IX_Stay_Start_End_Loc ON dbo.stg_Stay (Location,Start,End);
Due to the fact that Shifts has ~36 million records and Stays has ~650k, what can I do to make this perform better?
Upvotes: 3
Views: 248
Reputation: 5743
SELECT *
FROM Locations l
OUTER APPLY -- Assume a staff won't appear in different location in the same period of time, of course.
(
SELECT
CONVERT(decimal(14,2), SUM(CostPerMinute * OverlappedMinutes)) AS ActualCost,
COUNT(DISTINCT StaffId) AS StaffCount,
SUM(OverlappedMinutes) AS StaffMinutes
FROM
(
SELECT
*,
-- Calculate overlapped time in minutes
DATEDIFF(MINUTE,
CASE WHEN StartTime > l.StartTime THEN StartTime ELSE l.StartTime END, -- Get greatest start time
CASE WHEN EndTime > l.EndTime THEN l.EndTime ELSE EndTime END -- Get least end time
) AS OverlappedMinutes,
Cost / DATEDIFF(MINUTE, StartTime, EndTime) AS CostPerMinute
FROM Staff
WHERE LocationId = l.LocationId
AND StartTime <= l.EndTime AND l.StartTime <= EndTime -- Match with overlapped time
) data
) StaffInLoc
Upvotes: 1
Reputation: 3668
Take below with a grain of salt since your naming is horrible.
Location should really be a Stay as i guess location is another table defining an single physical location.
Your Staff table is also badly named. Why not name it Shift. I would expect a staff table to contain stuff like Name, Phone etc. Where a Shift table can contain multiple shifts for the same Staff etc.
Second i think your missing a relation between the two tables.
If you join Location and Staff only on Location and overlapping date times i don't think it would make a whole lot of sense for what your trying to do. How do you know which staff is at any location for a given time? Onlything you can do with location and overlapping dates is assume a entry is in the location table relates to every staff who have a shift at that location within the timeframe. So look at the below more as an inspiration to solving your problems and how to find overlapping datetime intervals and less like an actual solution to your problem since i think your data and model is in a bad shape.
If i got it all wrong please provide Primary Keys and Foreign Keys on your tables and a better explanation.
Some dummy data
DROP TABLE dbo.Location
CREATE TABLE dbo.Location
(
StayId INT,
EventId INT,
PersonId INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 987 ,1 ,123 ,1 ,'2015-05-20T07:00:00','2015-05-20T08:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 374 ,1 ,123 ,4 ,'2015-05-20T08:00:00','2015-05-20T10:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 184 ,1 ,123 ,3 ,'2015-05-20T10:00:00','2015-05-20T11:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 798 ,1 ,123 ,8 ,'2015-05-20T11:00:00','2015-05-20T12:00:00')
DROP TABLE dbo.Staff
CREATE TABLE Staff
(
StaffId INT,
Cost INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 99 ,40 ,1 ,'2015-05-20T04:00:00','2015-05-20T12:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 15 ,85 ,1 ,'2015-05-20T03:00:00','2015-05-20T05:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 85 ,74 ,3 ,'2015-05-20T18:00:00','2015-05-20T20:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 10 ,36 ,4 ,'2015-05-20T06:00:00','2015-05-20T14:00:00')
Actual query
WITH OnLocation AS
(
SELECT
L.StayId, L.EventId, L.LocationId, L.PersonId, S.Cost
, IIF(L.StartTime > S.StartTime, L.StartTime, S.StartTime) AS OnLocationStartTime
, IIF(L.EndTime < S.EndTime, L.EndTime, S.EndTime) AS OnLocationEndTime
FROM dbo.Location L
LEFT JOIN dbo.Staff S
ON S.LocationId = L.LocationId -- TODO are you not missing a join condition on staffid
-- Detects any overlaps between stays and shifts
AND L.StartTime <= S.EndTime AND L.EndTime >= S.StartTime
)
SELECT
*
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) AS DurationMinutes
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) / 60.0 * Cost AS DurationCost
FROM OnLocation D
To get a summary you can take the query and add a GROUP BY for whatever your wan't to summarize.
Upvotes: 0