Katalin Szeghalmi
Katalin Szeghalmi

Reputation: 73

Average calculates weighted average in SQL Server

Could someone explain me why the AVG() function gives me a weighted average in my code?

SELECT s.stud_id, s.country, SUM(e.paid) AS totalpaid
    INTO #totalpaid 
    FROM oc.students AS s
    JOIN oc.enrollment AS e ON s.stud_id = e.stud_id
GROUP BY s.country ,s.stud_id;

SELECT DISTINCT s.country, ROUND(AVG(t.totalpaid) OVER (PARTITION BY s.country),0) AS avg_country
    FROM #totalpaid t
    JOIN oc.students s ON t.stud_id = s.stud_id
    JOIN oc.enrollment e ON e.stud_id = s.stud_id; 

For example in the case of Malta Student12 attended 1 course and paid EUR45,Student837 attended 7 courses and paid EUR294 in total. I would like to have a simple (45+294)/2 calculation for the average however the system calculates like (1*45+7*294)/8. What am I doing wrong? enter image description here

Upvotes: 1

Views: 310

Answers (4)

Katalin Szeghalmi
Katalin Szeghalmi

Reputation: 73

In the meantime I found the solution:

 SELECT
      country,
      ROUND(AVG(totalpaid) OVER (PARTITION BY country),0) AS avg_country
    FROM #totalpaid;

Super easy :)

Upvotes: 1

Wouter
Wouter

Reputation: 2976

SELECT s.country, sum(e.paid) / count(DISTINCT s.stud_id) as average
FROM oc.students s 
JOIN oc.enrollment e ON e.stud_id = s.stud_id
GROUP BY s.country; 

Upvotes: 1

Eric Brandt
Eric Brandt

Reputation: 8101

In your second query, when you join your temp table back to enrollment, it's generating one row per class; which is where the multiple values in the totalpaid column are coming from.

The second query isn't using any columns that aren't already in the temp table, so you don't need those joins at all. This should produce what you're looking for.

SELECT 
  t.country, 
  ROUND(AVG(t.totalpaid) OVER (PARTITION BY t.country),0) AS avg_country
FROM #totalpaid t
GROUP BY 
  t.country;

Upvotes: 2

The Impaler
The Impaler

Reputation: 48770

Because you are joining your tables twice.

By putting together the INSERT and SELECT statements your query is equivalent to:

SELECT
  DISTINCT s.country, 
  ROUND(AVG(t.totalpaid) OVER (PARTITION BY s.country),0) AS avg_country
FROM (
  SELECT s.stud_id, s.country, SUM(e.paid) AS totalpaid
  FROM oc.students AS s
  JOIN oc.enrollment AS e ON s.stud_id = e.stud_id
  GROUP BY s.country ,s.stud_id    
) t
JOIN oc.students s ON t.stud_id = s.stud_id
JOIN oc.enrollment e ON e.stud_id = s.stud_id

There you can clearly see the table students and enrollment are joined twice. That would produce a skewed average function.

Upvotes: 2

Related Questions