user11605231
user11605231

Reputation:

SQL avg() Incorrect

The first table is the raw data, and the second table is the average of the numbers from the first table. The average should be 733.43, but is coming out to 760.57575. Can anyone help me understand why these numbers are different?

Queries

I know that there are other similar questions, but this is a lot simpler than those.

Edit: The following query solves my problem

SELECT st.begin_ts, sum(st.count) / count(st.count) as average

FROM (
    SELECT begin_ts, process_name, avg(count) as count
    FROM ecn_stats_2019_06_18
    WHERE process_name LIKE 'matching_%_gw' and name = 'raw_msg_count' and begin_ts <= '2019-06-18 07:00:00'
    GROUP BY begin_ts, process_name) as st

GROUP BY begin_ts;

Upvotes: 0

Views: 349

Answers (1)

Abelisto
Abelisto

Reputation: 15614

Ok, lets remember the school arithmetic. You have avg() in both queries and you assumes that the sum of avgs from first query divided by rows count in it will be equal to sum of actual values divided by actual rows count. Something like:

Firs query:

row1, (a + b) / 2
row2, (c + d + e) / 3

and second query, you think, should to return

((a + b) / 2 + (c + d + e) / 3) / 2

Yes? Lets expand the last expression:

((a + b) / 2 + (c + d + e) / 3) / 2 = (a + b) / 4 + (c + d + e) / 6 = a/4 + b/4 + c/6 + d/6 + e/6

But your second query actually calculates something like

(a + b + c + d + e) / 5 = a/5 + b/5 + c/5 + d/5 + e/5

which obviously is not equal to the previous one.

Example:

with t(a, b, c) as (values(1,1,2.0), (1,1,3.0), (1,2,4.0), (1,2,5.0), (1, 2, 6.0))
select a, avg(c) from t group by a, b;
 a |        avg         
---+--------------------
 1 | 5.0000000000000000
 1 | 2.5000000000000000
(2 rows)

According to the your assuming the result of query

with t(a, b, c) as (values(1,1,2.0), (1,1,3.0), (1,2,4.0), (1,2,5.0), (1, 2, 6.0))
select a, avg(c) from t group by a;

should be (5 + 2.5) / 2 = 3.75 But it is actually is 4 (just check)

Upvotes: 2

Related Questions