Jørgen R
Jørgen R

Reputation: 10806

SELECT distinct gives wrong count

I've got a problem where my count(*) will return the number of rows before distinct rows are filtered.

This is a simplified version of my query. Note that I'll extract a lot of other data from the tables, so group by won't return the same result, as I'd have to group by maybe 10 columns. The way it works is that m is a map mapping q, c and kl, so there can be several references to q.id. I only want one.

SELECT distinct on (q.id) count(*) over() as full_count
from q, c, kl, m 
where c.id = m.chapter_id
    and q.id = m.question_id
    and q.active = 1
    and c.class_id = m.class_id
    and kl.id = m.class_id
order by q.id asc

If I run this i get full_count = 11210 while it only returns 9137 rows. If I run it without the distinct on (q.id), distinct on (q.id) is indeed the number of rows.

So it seems that the count function doesn't have access to the filtered rows. How can I solve this? Do I need to rethink my approach?

Upvotes: 0

Views: 2237

Answers (2)

user330315
user330315

Reputation:

I'm not entirely sure what exactly you are trying to count, but this might get you started:

select id, 
       full_count,
       id_count
from (
    SELECT q.id, 
           count(*) over() as full_count,
           count(*) over (partition by q.id) as id_count,
           row_number() over (partition by q.id order by q.id) as rn
    from q
      join m on q.id = m.question_id
      join c on c.id = m.chapter_id and c.class_id = m.class_id
      join kl on kl.id = m.class_id
    where q.active = 1
) t
where rn = 1
order by q.id asc

If you need the count per id, then the column id_count would be what you need. If you need the overall count, but just on row per id, then the full_count is probably what you want.

(note that I re-wrote your implicit join syntax to use explicit JOINs)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270091

Can you use a subquery:

select qid, count(*) over () as full_count
from (SELECT distinct q.id
      from q, c, kl, m 
      where c.id = m.chapter_id
            and q.id = m.question_id
            and q.active = 1
            and c.class_id = m.class_id
            and kl.id = m.class_id
     ) t
order by q.id asc

But the group by is the right approach. The key word distinct in selectis really just syntactic sugar for doing a group by on all non-aggregate-functioned columns.

Upvotes: 0

Related Questions