Best method to merge two sql tables toegether

Question

So I have two tables. One tracking a a persons location, and one that has the shifts of staff members.

Staff members have a staffId, location, start and end times, and cost of that shift.

People have an eventId, stayId, personId, location, start and end time. A person will have an event with multiple stays.

What I am attempting to do is mesh these two tables together, so that I can accurately report the cost of each location stay, based on the duration of that stay multiplied by the associated cost of staff covering that location at that time.

The issues I have are:

Location stays do not align with staff shifts. i.e. a person might be in location a between 1pm and 2pm, and four staff might be on shifts from 12:30 to 1:30, and two on from 1:30 till 5.
There are a lot of records.
Not all staff are paid the same

My current method is to expand both tables to have a record for every single minute. So a stay that is between 1pm and 2pm will have 60 records, and a staff shift that goes for 5 hours will have 300 records. I can then take all staff that are working on that location at that minute to get a minute value based on the cost of each staff member divided by the duration of their shift, and apply that value to the corresponding record in the other table.

Techniques used:

I create a table with 50,000 numbers, since some stays can be quite long.
I take the staff table and join onto the numbers table to split each shift. Then group it together based on location and minute, with a staff count and minute cost.
The final step, and the one causing issues, is where I take the location table, join onto numbers, and also onto the modified staff table to produce a cost for that minute. I also count the number of people in that location to account for staff covering multiple people.

I'm finding this process extremely slow as you can imagine, since my person table has about 500 million records when expanded to the minute level, and the staff table has about 35 million when the same thing is done.

Can people suggest a better method for me to use?

Sample data: Locations

| EventId |  ID | Person | Loc |          Start         |         End
|  1      | 987 |  123   |  1  | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 
|  1      | 374 |  123   |  4  | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 
|  1      | 184 |  123   |  3  | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 
|  1      | 798 |  123   |  8  | May, 20 2015 11:00:00 | May, 20 2015 12:00:00

Staff

| Loc | StaffID | Cost |         Start         |         End
|  1  | 99      |  40  | May, 20 2015 04:00:00 | May, 20 2015 12:00:00 
|  1  | 15      |  85  | May, 20 2015 03:00:00 | May, 20 2015 5:00:00 
|  3  | 85      |  74  | May, 20 2015 18:00:00 | May, 20 2015 20:00:00 
|  4  | 10      |  36  | May, 20 2015 06:00:00 | May, 20 2015 14:00:00

Result

| EventId | ID | Person | Loc | Start | End | Cost | 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 | 45.50 | 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 | 81.20 | 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 | 95.00 | 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00 | 14.75

SQL: Numbers table

;WITH x AS 
(
  SELECT TOP (224) object_id  FROM sys.all_objects 
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.object_id) 
INTO #numbers
FROM x CROSS JOIN x AS y 
ORDER BY n

Staff Table

SELECT 
    Location,
    ISNULL(SUM(ROUND(Cost/ CASE WHEN (DateDiff(MINUTE, StartDateTime, EndDateTime)) = 0 THEN 1 ELSE (DateDiff(MINUTE, StartDateTime, EndDateTime)) END, 5)),0) AS MinuteCost,
    Count(Name) AS StaffCount,
    RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, StartDateTime) + n.n -1,     0) 
INTO #temp_StaffRoster
FROM dbo.StaffRoster

Grouping together, and where help is needed I think

    INSERT INTO dbo.FinalTable
    SELECT [EventId]
          ,[Id]
          ,[Start]
          ,[End]
          ,event.[Location]
          ,SUM(ISNULL(MinuteCost,1)/ISNULL(PeopleCount, 1)) AS Cost
          ,AVG(ISNULL(StaffCount,1)) AS AvgStaff
      FROM dbo.Events event WITH (NOLOCK) 
      INNER JOIN #numbers n ON n.n BETWEEN 0 AND  DATEDIFF(MINUTE, Start, End)
      LEFT OUTER JOIN #temp_StaffRoster staff WITH (NOLOCK) ON staff.Location= event.Location AND staff.RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0)
      LEFT OUTER JOIN (SELECT [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0) AS Mins, COUNT(Id) as PeopleCount
                       FROM dbo.Events WITH (NOLOCK) 
                       INNER JOIN #numbers n ON n.n BETWEEN 0 AND  DATEDIFF(MINUTE, Start, End)
                       GROUP BY [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0)
                       ) cap ON cap.Location= event.LocationAND cap.Mins = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0)

      GROUP BY [EventId]
              ,[Id]
              ,[Start]
              ,[End]
              ,event.[Location]

UPDATE

So I have two tables. One tracking a a persons location, and one that has the shifts of staff members with their cost. I am attempting to consolidate the two tables to calculate the cost of each location stay.

Here is my method:

;;WITH stay AS
(
    SELECT TOP 650000
        StayId,
        Location,
        Start,
        End
    FROM stg_Stay
    WHERE Loction IS NOT NULL  -- Some locations don't currently have a matching shift location
    ORDER BY Location, ADTM
),
shift AS
(
    SELECT TOP 36000000
        Location,
        ShiftMinute,
        MinuteCost,
        StaffCount
    FROM stg_Shifts
    ORDER BY Location, ShiftMinute
)

SELECT 
    [StayId],
    SUM(MinuteCost) AS Cost,
    AVG(StaffCount) AS StaffCount
INTO newTable
FROM stay S
CROSS APPLY (SELECT MinuteCost, StaffCount
                FROM shift R 
                WHERE R.Location = S.Location
                 AND R.ShiftMinute BETWEEN S.Start AND S.End 
            ) AS Shifts
GROUP BY [StayId]

This is where I'm at.

I've split the Shifts table into a minute by minute level since there is no clear alignment of shifts to stays.

stg_Stay contains more columns than needed for this operation. stg_Shift is as shown.

Indexes used on stg_Shifts:

CREATE NONCLUSTERED INDEX IX_Shifts_Loc_Min
ON dbo.stg_Shifts (Location, ShiftMinute)
INCLUDE (MinuteCost, StaffCount);

on stg_Stay

CREATE INDEX IX_Stay_StayId ON dbo.stg_Stay (StayId);
CREATE CLUSTERED INDEX IX_Stay_Start_End_Loc ON dbo.stg_Stay (Location,Start,End);

Due to the fact that Shifts has ~36 million records and Stays has ~650k, what can I do to make this perform better?

Eric · Accepted Answer

Don't break down the rows by minutes.
Staging table may help if you can create fast relationship between them. i.e. the overlapped interval

SELECT * 
FROM Locations l
OUTER APPLY -- Assume a staff won't appear in different location in the same period of time, of course.
(
  SELECT 
    CONVERT(decimal(14,2), SUM(CostPerMinute * OverlappedMinutes)) AS ActualCost,
    COUNT(DISTINCT StaffId) AS StaffCount,
    SUM(OverlappedMinutes) AS StaffMinutes
  FROM
  (
    SELECT 
      *,
      -- Calculate overlapped time in minutes
      DATEDIFF(MINUTE,
        CASE WHEN StartTime > l.StartTime THEN StartTime ELSE l.StartTime END, -- Get greatest start time
        CASE WHEN EndTime > l.EndTime THEN l.EndTime ELSE EndTime END -- Get least end time
      ) AS OverlappedMinutes,
      Cost / DATEDIFF(MINUTE, StartTime, EndTime) AS CostPerMinute
    FROM Staff 
    WHERE LocationId = l.LocationId 
      AND StartTime <= l.EndTime AND l.StartTime <= EndTime -- Match with overlapped time
  ) data
) StaffInLoc

SQL Fiddle

Best method to merge two sql tables toegether

Answers (2)

Related Questions