guigoz
guigoz

Reputation: 704

Why does count(distinct ..) return different values on the same table?

select count(distinct a,b,c,d) from mytable;
select count(distinct concat(a,'-',b),concat(c,'-',d)) from mytable;

Since '-' never appears in a,b,c,d fields, the 2 queries above should return the same result. Am I right ?

Actually it is not the case, the difference is 4 rows out of ~60M and I cant figure out how this is possible

Any idea or example ? Thanks

Upvotes: 1

Views: 1454

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269693

First, I am assuming that you are using MySQL, because that is the only database of your original tags where your syntax would be accepted.

Second, this does not directly answer your question. Given your types and expressions, I do not see how you can get different results. However, very similar constructs can produce different results.

It is very important to note that NULL is not the culprit. If any argument is NULL for either COUNT(DISTINCT) or CONCAT(), then the result is NULL -- and NULLs are not counted.

However, spaces at the end of strings can be an issue. Consider the results from this query:

select count(distinct x, y),
       count(distinct concat(x, '-', y)), 
       count(distinct concat(y, '-', x))
from (select 1 as x, 'a' as y union all
      select 1, 'a ' union all
      select 1, NULL
     ) a

I would expect the second and third arguments to return the same thing. But spaces at the end of the string cause differences. COUNT(DISTINCT) ignores them. However, CONCAT() will embed them in the string. Hence, the above returns

1     1     2

And the two values are different.

In other words, two values may not be exactly the same, but COUNT(DISTINCT) might regard them as the same. Spaces are one example. Collations are another potential culprit.

Upvotes: 1

Krupa
Krupa

Reputation: 455

Take example of sample data

A   B   C   D
1   2   3   4
5   6   7   8
1   2   5   7
1   2   5   7
1   3   3   4
1   3   3   4

then count (distinct (a, b, c, d)) = 4

A   B   C   D
1   2   3   4
5   6   7   8
1   2   5   7
1   3   3   4

and count (distinct (a,-,b), distinct (c,-,d)) = 3

dist (a,-,b)        dist (c,-,d)    
1   2               3   4
5   6               7   8
1   3               5   7

Upvotes: 0

Related Questions