Jury A
Jury A

Reputation: 20062

Why I get different result when using inner join

I have a table that contains the following columns; a, b, e and another table, table2 that contains c,d. Both tables have a column name.

I made a query:

SELECT distinct a, b
from db.table
where e <>'65';

And I got say, 1885 records

Then, I needed more information from another table (c, d). I used inner join to get these additional information. The query becomes:

SELECT distinct a, b, c, d 
from db.table
inner join db.table2
on table.name=table2.name2
where e <>'65';

Now, I was expecting the same result but with additional columns (c, d). But I got a different number of results: 51144. Can any body explain please why the number increased while I just added more columns and did not change the condition ?

Upvotes: 3

Views: 470

Answers (5)

John Woo
John Woo

Reputation: 263693

It's because the DISTINCT keyword applies to the whole row.

Say you have this results:

a       b        c         d
=====================================
1       1        1         1
1       1        3         2
1       1        4         2
1       2        5         2
1       2        6         6
1       3        1         1

if you apply DISTINCT a, b, it should result:

a       b     
==============
1       1             
1       2      
1       3    

because two of the records were duplicate of 1, 1 . It will return all records if you say DISTINCT a, b, c ,d.

One other reason why your query returns multiple records is because it might be that the records of tableA has many corresponding matches on the records of tableB. Let's say on tableA you have only 1 record but in tableB that record has corresponding match of 5 records. If you try to join the tables, instead of 1 record from tableA, it should now return 5 rows because of the matches.

Upvotes: 0

Adriaan Stander
Adriaan Stander

Reputation: 166356

DISTINCT returns as the key word says, the distinct results from the select.

The fact that you are adding additional columns to the distinct select will possibly cause the distinct result set to be more rows. Lets have a look at an example

Distinct from

A   B
1   2
1   3
1   2

would be

A   B
1   2
1   3

but now adding additional columns to this, let say

A   B   C
1   2   1
1   3   2
1   2   3

will result in

A   B   C
1   2   1
1   3   2
1   2   3

Further more, your inner join can limit the result set, as inner join will only return values that are in both table1 and table2, so it a given value is present in table1, but not in table2 it will not be returned.

Or as was mentioned by @zerkms, if there were 2 keys defining the relationship between the 2 tables, you might get more that you expected.

Upvotes: 0

Jay
Jay

Reputation: 93

What is surprising? You have 2 more columns added to your select statement and another table.

As an example distinct will apply to the 4 columns (a,b,c,d) not being repeated

If you're table is like this

A B C D E
1 2 1 1
1 2 3 4
1 2 4 5
1 2 4 5

You're first statement will select distinct A,B which is only 1,2. The second statement will select 1,2,1,1 - 1,2,3,4 - 1,2,4,5 as long as it fulfills condition e <> 65

For the rest you need to use the ON to be more specific with regards to your select.

Upvotes: 0

xkeshav
xkeshav

Reputation: 54016

INNER JOIN will return only those result which match the same value of given column ( used in ON condition ) in both tables.

Upvotes: 1

zerkms
zerkms

Reputation: 254886

It is called cartesian product and practically it means that you likely need to be more precise on the join ON conditions.

The same result would be only possible if table2 table contains one and only one record that matches condition table.name=table2.name2 per each row from table table

Upvotes: 1

Related Questions