Fllappy
Fllappy

Reputation: 401

Joined query producing more results compared to solo query

I am performing the following query which has an inner join against another table.

select count(myTable.name) 
from sch2.sample_detail as myTable 
inner join sch1.otherTable as otherTable on myTable.name = otherTable.name 
where otherTable.is_valid = 1 
  and myTable.name IS NOT NULL;

This produces a count of 4912304.

The following is a query just on a single table (my table).

SELECT COUNT(myTable.name) 
from sch2.sample_detail as myTable 
where myTable.name IS NOT NULL;

This produces a count of 2864654.

But how is this possible? Both queries have the clause where myTable.name IS NOT NULL.

Shouldn't the second query produce same results or if not even more cos the second query doesn't have the otherTable.is_valid = 1 clause? Why does the inner join produces a higher count of result?

Please advice if there is something I should amend in the 1st query, thanks.

Upvotes: 0

Views: 48

Answers (1)

leftjoin
leftjoin

Reputation: 38335

Inner, left or cross join can duplicate rows. sch1.otherTable.name is not unique and this causing rows duplication because for each row in left table all corresponding rows from right table are being selected, this is normal join behavior.

To get duplicate names list use this query and decide how to remove duplicated rows: filter or distinct or filter by row_number, etc.

select count(*) cnt,
       name
  from sch1.otherTable
having count(*)>1
order by cnt desc;

If you need EXISTS (and do not need to select columns from otherTable), use left semi join. Also subquery with distinct can be used to pre-aggregate name before join and filter:

select count(myTable.name) 
from sch2.sample_detail as myTable 
LEFT SEMI JOIN (select distinct name from sch1.otherTable otherTable where otherTable.is_valid = 1 ) as otherTable on myTable.name = otherTable.name 
where myTable.name IS NOT NULL;

Upvotes: 2

Related Questions