Reputation: 13
I am fiddling with MS SQL and I am getting different results after running these two queries (...maybe it is just my amateurism in MS SQL):
/* If you spot any syntax errors it doesn't matter, I am writing this out of my head */
SELECT
AVG(X.AvgDailyExpense) AS AverageDailyExpense
FROM
(SELECT
AVG(we.DailyExpense) AS AvgDailyExpense
FROM
WorkerExpense we
LEFT JOIN Worker w ON w.Id = we.WorkerId
GROUP BY
w.Id) X;
SELECT
AVG(we.DailyExpense) AS AverageDailyExpense
FROM
WorkerExpense we
LEFT JOIN Worker w ON w.Id = we.WorkerId;
There is a foreign key between WorkerExpense and Worker and it is impossible for WorkerExpense table to have a reference to worker row that doesn't exist. Also, DailyExpense is money data type (maybe that matters?).
Now, the lower query is returning what should be the correct result (I calculated the result manually on small number of rows) and upper query is always returning a value greater than it should be.
Could anyone please explain it a bit more detailed why is this happening? Is it because of number rounding or...?
Upvotes: 1
Views: 451
Reputation: 16364
Suppose that you have workers with the following expenses:
A: 1, 2, 3, 4, 5, 6, 7, 8, 9 (average 45 / 9 = 5)
B: 12, 13, 14 (average 39 / 3 = 13)
The straight average of all the expenses is (45 + 39) / 12 = 7, but the average of the averages of the two workers is (5 + 13) / 2 = 9.
Upvotes: 1
Reputation: 2870
Those are two different mathematical questions. Consider one Worker group with two WorkerExpense entries, which are 0 and 3000. Consider a second Worker group with a single WorkerExpense Entry, which is 0.
Your first query will find the averages of the two groups first, e.g., 1500 and 0, and then average those numbers, for a result of 750.
Your second query will average the three numbers 0, 3000, and 0, for a result of 1000.
These are two different things. You have to decide which one you intend.
Upvotes: 1
Reputation: 2638
These queries will return different results because they are answering different questions. The second query answers "what is the average daily expense across all daily expense reports". The first query first finds the average daily expense by individual and then asks "what is the average of the individual-based average daily expense". They are answering very different questions.
Another way to think about it is that the the second query is giving greater weight to individuals that have a larger number of expense reports. The first query normalizes the data by individual.
Upvotes: 1