Inner join returning more rows then regular select

Question

I have a query like this:

SELECT id, run_date, feed_type_id, text 
FROM myTable
WHERE run_date >= ('20140506', 'yyyymmdd') AND run_date < ('20140506', 'yyyymmdd') + 1

This returns me 230k rows.

When I wrap it with a WITH CLAUSE:

WITH daily_run as (
  SELECT     /* MATERIALZE */
    id, run_date, feed_type_id, text 
  FROM myTable
  WHERE run_date >= ('20140506', 'yyyymmdd') AND run_date < ('20140506', 'yyyymmdd') + 1)

SELECT 
id, run_date, feed_type_id, text from daily_run run;

This also returns me 230k rows.

However when I do a join:

WITH daily_run as (
SELECT /* MATERIALZE */
id, run_date, feed_type_id, text 
FROM myTable
WHERE run_date >= ('20140506', 'yyyymmdd') AND run_date < ('20140506', 'yyyymmdd') + 1)

SELECT 
run.id, run.run_date, run.feed_type_id, run.text 
FROM daily_run run 
  INNER JOIN feed_id_types types 
  ON run.feed_type_id = types.feed_type_id

I get an increase of 50k rows. This increase in rows happens no matter what date I run it for (not always 50k).

The other confusing part is replacing the join with a different condition:

SELECT 
  run.id, run.run_date, run.feed_type_id, run.text 
FROM daily_run run 
WHERE run.feed_type_id in (SELECT types.feed_type_id FROM feed_id_types types)

Returns the correct 230k number.

The table feed_id_types has 19 rows, and I want to join it to determine if I need to process that particular run.

Is there something I am missing from my join condition?

Michael Green · Accepted Answer

You have duplicate rows in feed_id_types. Run this to find which IDs are duplicated:

select
    types.feed_type_id
from feed_id_types types
group by types.feed_type_id
having count(*) > 1

The IN() clause ignores the duplicates, matching on the first one it finds. The inner join matches each row from daily_run to every matching row in feed_id_types, creating extra results.

Inner join returning more rows then regular select

Answers (1)

Related Questions