Reputation: 1547
I have a query like this:
SELECT id, run_date, feed_type_id, text
FROM myTable
WHERE run_date >= ('20140506', 'yyyymmdd') AND run_date < ('20140506', 'yyyymmdd') + 1
This returns me 230k rows.
When I wrap it with a WITH CLAUSE:
WITH daily_run as (
SELECT /* MATERIALZE */
id, run_date, feed_type_id, text
FROM myTable
WHERE run_date >= ('20140506', 'yyyymmdd') AND run_date < ('20140506', 'yyyymmdd') + 1)
SELECT
id, run_date, feed_type_id, text from daily_run run;
This also returns me 230k rows.
However when I do a join:
WITH daily_run as (
SELECT /* MATERIALZE */
id, run_date, feed_type_id, text
FROM myTable
WHERE run_date >= ('20140506', 'yyyymmdd') AND run_date < ('20140506', 'yyyymmdd') + 1)
SELECT
run.id, run.run_date, run.feed_type_id, run.text
FROM daily_run run
INNER JOIN feed_id_types types
ON run.feed_type_id = types.feed_type_id
I get an increase of 50k rows. This increase in rows happens no matter what date I run it for (not always 50k).
The other confusing part is replacing the join with a different condition:
SELECT
run.id, run.run_date, run.feed_type_id, run.text
FROM daily_run run
WHERE run.feed_type_id in (SELECT types.feed_type_id FROM feed_id_types types)
Returns the correct 230k number.
The table feed_id_types has 19 rows, and I want to join it to determine if I need to process that particular run.
Is there something I am missing from my join condition?
Upvotes: 0
Views: 2765
Reputation: 1491
You have duplicate rows in feed_id_types. Run this to find which IDs are duplicated:
select
types.feed_type_id
from feed_id_types types
group by types.feed_type_id
having count(*) > 1
The IN()
clause ignores the duplicates, matching on the first one it finds. The inner join matches each row from daily_run
to every matching row in feed_id_types
, creating extra results.
Upvotes: 2