Jury A
Jury A

Reputation: 20062

Difference in queries results with group by and without it

I have two tables: t1 with the following columns: name | key | length
t2 with the following columns: name | country.

I need to select all distinct keys with length>2000 group by country. So, I made

SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;

But, when I make the query:

SELECT count(distinct key)
from db.t1
where Length>2000;

I am supposed to get equal results but I'm getting different results. For example, in the first query, I get 125494 and in the second I get: 121653.

What is the reason for this different results?? Knowing that there are some fields in the country are ''. It seems to me they don't appear as a group and i counted them and found that they are 134 records. but I can't find out the reason.

Upvotes: 0

Views: 44

Answers (2)

eggyal
eggyal

Reputation: 125865

Unless key is UNIQUE (in which case, why bother with the DISTINCT keywords?), there is no reason that your two queries should return the same results.

Suppose t1 contains:

+------+-----+--------+
| name | key | length |
+------+-----+--------+
|    a |   x |   5000 |
|    b |   x |   5000 |
|    b |   y |   5000 |
|    c |   z |   5000 |
+------+-----+--------+

And t2 contains:

+------+---------+
| name | country |
+------+---------+
|    a |      uk |
|    b |      fr |
|    c |      de |
+------+---------+

Then your queries will return:

  1. First query:

    SELECT count(distinct key), country
    from db.t1
    inner join db.t2
    on t1.name=t2.name
    where length>2000
    group by country;
    

    Will yield:

    +---------------------+---------+
    | count(distinct key) | country |
    +---------------------+---------+
    |                   1 |      uk |
    |                   2 |      fr |
    |                   1 |      de |
    +---------------------+---------+
    
  2. Second query:

    SELECT count(distinct key)
    from db.t1
    where Length>2000;
    

    Will yield:

    +---------------------+
    | count(distinct key) |
    +---------------------+
    |                   3 |
    +---------------------+
    

See it on sqlfiddle.

Upvotes: 1

Tieran
Tieran

Reputation: 1036

If you have multiple rows in t2 with the same name the join will be creating duplicates.

Upvotes: 0

Related Questions