Why does my record count blow up with a left join?

Question

If I run this:

Select *
FROM RAW_DATA
WHERE Portfolio like '%deposit%'

I get 131047 records. Now, I join to another table, and I want all records from RAW_DATA and matches from another table, like this:

Select *
FROM RAW_DATA AS RawData LEFT OUTER JOIN
DATAHIST AS HIST ON RawData.Parse2 = HIST.CONTACT_ID AND RawData.AsofDate = HIST.ASOFDATE
WHERE RawData.Portfolio like '%deposit%'

Now, my count blows up to 158745. If I want everything from Raw_Data and only matches from DATAHIST, how do I create the join line? There are only a couple options here.

Will I have to count rows, and select where rn = 1?

Paul Maxwell · Accepted Answer

A history table with only 1 row for any source row would be very unhelpful because a history table usually holds all history for each row of source data. So you should expect the number of rows to expand.

What I suspect you want is "the most recent entry" of history, and for such a need it will help to number the rows before joining, like this:

SELECT
      *
FROM RAW_DATA AS rawdata
LEFT OUTER JOIN (
      SELECT
            *
          , row_number() (PARTITION BY CONTACT_ID
                           ORDER BY ASOFDATE DESC) AS rn
      DATAHIST
) AS hist ON rawdata.Parse2 = hist.CONTACT_ID
      AND hist.rn = 1
WHERE rawdata.Portfolio LIKE '%deposit%'

So if there is more that one row in history for any row of rawdata, it will only permit joining to the most recent matching row in the history table.

Vary the order by to affect which rows are joined. e.g. by changing to ascending order you would get the "earliest" history instead of the "latest". If the asofdate column is sufficient add others as tie breakers e.g. order by asofdate desc, ID desc

Why does my record count blow up with a left join?

Answers (2)

Related Questions