Reputation: 20342
If I run this:
Select *
FROM RAW_DATA
WHERE Portfolio like '%deposit%'
I get 131047 records. Now, I join to another table, and I want all records from RAW_DATA and matches from another table, like this:
Select *
FROM RAW_DATA AS RawData LEFT OUTER JOIN
DATAHIST AS HIST ON RawData.Parse2 = HIST.CONTACT_ID AND RawData.AsofDate = HIST.ASOFDATE
WHERE RawData.Portfolio like '%deposit%'
Now, my count blows up to 158745. If I want everything from Raw_Data and only matches from DATAHIST, how do I create the join line? There are only a couple options here.
Will I have to count rows, and select where rn = 1?
Upvotes: 1
Views: 2706
Reputation: 35603
A history table with only 1 row for any source row would be very unhelpful because a history table usually holds all history for each row of source data. So you should expect the number of rows to expand.
What I suspect you want is "the most recent entry" of history, and for such a need it will help to number the rows before joining, like this:
SELECT
*
FROM RAW_DATA AS rawdata
LEFT OUTER JOIN (
SELECT
*
, row_number() (PARTITION BY CONTACT_ID
ORDER BY ASOFDATE DESC) AS rn
DATAHIST
) AS hist ON rawdata.Parse2 = hist.CONTACT_ID
AND hist.rn = 1
WHERE rawdata.Portfolio LIKE '%deposit%'
So if there is more that one row in history for any row of rawdata, it will only permit joining to the most recent matching row in the history table.
Vary the order by to affect which rows are joined. e.g. by changing to ascending order you would get the "earliest" history instead of the "latest". If the asofdate
column is sufficient add others as tie breakers e.g. order by asofdate desc, ID desc
Upvotes: 1
Reputation: 27508
Determine which of the HIST
data has repetition (duplicates) in your join key (the combination of id and date). You should also check for duplicates in the RAW
data. Each key with duplicates will cause an n x m affect on the selection
drop table #RAW
drop table #HIST
create table #RAW (contact_id int, asofdate date);
insert into #RAW values
(1, '2017-01-01'),
(2, '2017-01-01')
create table #HIST (contact_id int, asofdate date, balance int);
insert into #HIST values
(1, '2017-01-01', 100), (1, '2017-01-01', 150), (1, '2017-01-02', 200),
(2, '2017-01-01', 125)
--select * from #RAW
--select * from #HIST
-- find duplicates in HIST
select distinct contact_id, asofdate, count(*) as occur_count
from #HIST
group by contact_id, asofdate
having count(*) > 1
-- because of the duplicates the result has 3 rows instead of the 2 you might be expecting
select raw.contact_id, raw.asofdate, hist.balance
from #raw raw left join #hist hist
on raw.contact_id = hist.contact_id and raw.asofdate = hist.asofdate
You probably want to clean that data or reduce it (per dlatikay) via aggregation or further select condition
Upvotes: 3