Reputation: 20062
I have a table that contains the following columns; a, b, e and another table, table2 that contains c,d. Both tables have a column name
.
I made a query:
SELECT distinct a, b
from db.table
where e <>'65';
And I got say, 1885 records
Then, I needed more information from another table (c, d). I used inner join to get these additional information. The query becomes:
SELECT distinct a, b, c, d
from db.table
inner join db.table2
on table.name=table2.name2
where e <>'65';
Now, I was expecting the same result but with additional columns (c, d). But I got a different number of results: 51144. Can any body explain please why the number increased while I just added more columns and did not change the condition ?
Upvotes: 3
Views: 470
Reputation: 263693
It's because the DISTINCT
keyword applies to the whole row.
Say you have this results:
a b c d
=====================================
1 1 1 1
1 1 3 2
1 1 4 2
1 2 5 2
1 2 6 6
1 3 1 1
if you apply DISTINCT a, b
, it should result:
a b
==============
1 1
1 2
1 3
because two of the records were duplicate of 1, 1
. It will return all records if you say DISTINCT a, b, c ,d
.
One other reason why your query returns multiple records is because it might be that the records of tableA
has many corresponding matches on the records of tableB
. Let's say on tableA
you have only 1 record but in tableB
that record has corresponding match of 5 records. If you try to join the tables, instead of 1 record from tableA, it should now return 5 rows
because of the matches.
Upvotes: 0
Reputation: 166356
DISTINCT returns as the key word says, the distinct results from the select.
The fact that you are adding additional columns to the distinct select will possibly cause the distinct result set to be more rows. Lets have a look at an example
Distinct from
A B
1 2
1 3
1 2
would be
A B
1 2
1 3
but now adding additional columns to this, let say
A B C
1 2 1
1 3 2
1 2 3
will result in
A B C
1 2 1
1 3 2
1 2 3
Further more, your inner join can limit the result set, as inner join will only return values that are in both table1 and table2, so it a given value is present in table1, but not in table2 it will not be returned.
Or as was mentioned by @zerkms, if there were 2 keys defining the relationship between the 2 tables, you might get more that you expected.
Upvotes: 0
Reputation: 93
What is surprising? You have 2 more columns added to your select statement and another table.
As an example distinct will apply to the 4 columns (a,b,c,d) not being repeated
If you're table is like this
A B C D E
1 2 1 1
1 2 3 4
1 2 4 5
1 2 4 5
You're first statement will select distinct A,B which is only 1,2. The second statement will select 1,2,1,1 - 1,2,3,4 - 1,2,4,5 as long as it fulfills condition e <> 65
For the rest you need to use the ON to be more specific with regards to your select.
Upvotes: 0
Reputation: 54016
INNER JOIN will return only those result which match the same value of given column ( used in ON
condition ) in both tables.
Upvotes: 1
Reputation: 254886
It is called cartesian product and practically it means that you likely need to be more precise on the join ON
conditions.
The same result would be only possible if table2
table contains one and only one record that matches condition table.name=table2.name2
per each row from table
table
Upvotes: 1