Ole Lynge
Ole Lynge

Reputation: 4587

Sql query with joins between four tables with millions of rows

We have a transact sql statement that queries 4 tables with millions of rows in each.

It takes several minutes, even though it has been optimized with indexes and statistics according to TuningAdvisor.

The structure of the query is like:

SELECT E.EmployeeName
    , SUM(M.Amount) AS TotalAmount
    , SUM(B.Amount) AS BudgetAmount
    , SUM(T.Hours) AS TotalHours
    , SUM(TB.Hours) AS BudgetHours
    , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E
LEFT JOIN MoneyTransactions M
    ON E.EmployeeID = M.EmployeeID
LEFT JOIN BudgetTransactions B
    ON E.EmployeeID = B.EmployeeID
LEFT JOIN TimeTransactions T
    ON E.EmployeeID = T.EmployeeID
LEFT JOIN TimeBudgetTransactions TB
    ON E.EmployeeID = TB.EmployeeID
GROUP BY E.EmployeeName

Since each transaction table contains millions of rows, I consider splitting it up into one query per transaction table, using table variables like @real, @budget, and @hours, and then joining these in a final SELECT. But in tests it seems to not speed up.

How would you deal with that in order to speed it up?

Upvotes: 1

Views: 2248

Answers (3)

Stefanos Zilellis
Stefanos Zilellis

Reputation: 611

Try this one:

SELECT E.EmployeeName, TA.TotalAmount, BA.BudgetAmount, TWH.TotalHours, BH.BudgetHours, TWH.WastedHours
FROM Employees E
LEFT JOIN 
(SELECT E.EmployeeID, SUM(M.Amount) AS TotalAmount
FROM Employees E INNER JOIN MoneyTransactions M ON E.EmployeeID = M.EmployeeID GROUP BY E.EmployeeID)TA
ON E.EmployeeID = TA.EmployeeID
LEFT JOIN 
(SELECT E.EmployeeID , SUM(B.Amount) AS BudgetAmount
FROM Employees E INNER JOIN BudgetTransactions B ON E.EmployeeID = B.EmployeeID GROUP BY E.EmployeeID)BA
ON E.EmployeeID = BA.EmployeeID
LEFT JOIN 
(SELECT E.EmployeeID , SUM(T.Hours) AS TotalHours , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E INNER JOIN TimeTransactions T ON E.EmployeeID = T.EmployeeID GROUP BY E.EmployeeID)TWH
ON E.EmployeeID = TWH.EmployeeID
LEFT JOIN 
(SELECT E.EmployeeID , SUM(TB.Hours) AS BudgetHours
FROM Employees E INNER JOIN TimeBudgetTransactions TB ON E.EmployeeID = TB.EmployeeID GROUP BY E.EmployeeID)BH
ON E.EmployeeID = BH.EmployeeID

Upvotes: 0

Bojan Milenkoski
Bojan Milenkoski

Reputation: 501

I don't know if you have all the indexes on your tables that will speed up things, but having big tables could have this impact on a query time. I would recommend partitioning the tables if possible. It is more work, but everything you do to speed up the query now it won't be enough after few millions new records.

Upvotes: 1

Quassnoi
Quassnoi

Reputation: 425401

I'm not sure the query you posted will yield the results you're expecting.

It will cross join all the dimension tables (MoneyTransactions etc.) and multiply all the results.

Try this:

SELECT  E.EmployeeName,
        (
        SELECT  SUM(amount)
        FROM    MoneyTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS TotalAmount,
        (
        SELECT  SUM(amount)
        FROM    BudgetTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS BudgetAmount,
        (
        SELECT  SUM(hours)
        FROM    TimeTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS TotalHours,
        (
        SELECT  SUM(hours)
        FROM    TimeBudgetTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS BudgetHours
FROM    Employees E

Upvotes: 8

Related Questions