Reputation: 3
I'm currently expanding a customer list query I wrote a few months ago to include more information about last periodic review. Work is running our data warehouse on Teradata SQL. Here's a snippet of the code I'm working with, the actual query is about 200 lines.
SELECT DISTINCT
k.customerID
,k.name
,a.CountryCode
,CASE WHEN Account.actorID IS NOT NULL THEN 1 ELSE 0 END AS hasAccount
,id.ControlDate
,id.ControlBy
FROM customer k
LEFT JOIN agreement a ON k.actorID = a.actorID
LEFT JOIN identification id ON k.actorID = id.actorID
INNER JOIN (SELECT DISTINCT actorID, MAX(ControlDate) AS LastControl FROM identification GROUP BY actorID) id2
ON k.actorID = id2.actorID AND id.ControlDate = id2.LastControl
LEFT JOIN (SELECT DISTINCT actorID FROM agreement a WHERE a.activeAgreement = 'Y' and a.Product IN ('6774', '6775') Account ON k.actorid = Account.actorID
WHERE
k.customerstatus = 'Active'
;
The trouble is with the INNER JOIN
statement.
When I run this, I get 1769 rows, but if I remove the INNER JOIN
and the two id
. phrases in SELECT
, that sum pops up to 2117.
The difference is NULL
values on id.ControlDate
.
However, if I use a LEFT JOIN
instead of INNER JOIN
, I get approximately 6800 rows, since a lot of customers have been updated/performed control on multiple times.
How can I fix this?
Edit: to clarify, I want one row per actorID
where I either get the latest controlDate
or NULL
value
Edit2: as per request by @Thorsten Kettner, an explanation.
Both ActorID
and CustomerID
is unique per customer. However, this value is generated by the system, while customerID
usually is a Social Security number, company registration number etc. We tend to use CustomerID
as a lookup value in our CRM system.
One customer (or actor) can have many agreements depending on their relationship with us, and many periodic reviews, as we are required by law to perform portfolio audits regularly.
And here are some sample data:
1) Without INNER JOIN
statement:
actorID Customer_name Country hasAccount ControlDate ControlBy
278 228 BANK OF AMERICA NA TRADE OPERATION US 0 ? ?
275 330 Branch Banking And Trust Company US 0 04.02.2016 AD09853
275 169 CITIZENS Bank NA US 1 12.03.2018 AB96358
275 169 CITIZENS Bank NA US 1 16.11.2016 AB02890
275 169 CITIZENS Bank NA US 1 15.12.2015 AB62775
275 169 CITIZENS Bank NA US 1 11.10.2011 AB68786
264 072 Jp Morgan Chase Sec. Lending Asset Management US 0 11.10.2017 AB45546
264 061 International Development Association US 0 29.05.2018 AB45546
263 995 Zions Bancorporation N.A US 1 19.03.2015 AB43584
263 995 Zions Bancorporation N.A US 1 09.11.2016 AB02890
263 995 Zions Bancorporation N.A US 1 13.03.2018 AB45546
263 995 Zions Bancorporation N.A US 1 06.10.2011 AB68786
263 939 Citigroup Global Markets Inc US 1 22.12.2015 AB62775
263 939 Citigroup Global Markets Inc US 1 12.04.2012 AB68786
262 114 Prebon Financial Products Inc US 0 30.12.2015 AB24733
262 113 JP Morgan Securities LLC US 0 18.06.2018 AB45546
261 795 Federal Reserve System US 0 05.11.2015 AB62759
261 795 Federal Reserve System US 0 05.06.2014 AB31660
2) With INNER JOIN
statement:
actorID Customer_name Country hasAccount ControlDate ControlBy
275 330 Branch Banking And Trust Company US 0 04.02.2016 AD09853
275 169 CITIZENS Bank NA US 1 12.03.2018 AB96358
264 072 Jp Morgan Chase Sec. Lending Asset Management US 0 11.10.2017 AB45546
264 061 International Development Association US 0 29.05.2018 AB45546
263 995 Zions Bancorporation N.A US 1 13.03.2018 AB45546
263 939 Citigroup Global Markets Inc US 1 22.12.2015 AB62775
262 114 Prebon Financial Products Inc US 0 30.12.2015 AB24733
262 113 JP Morgan Securities LLC US 0 18.06.2018 AB45546
261 795 Federal Reserve System US 0 05.11.2015 AB62759
As you can see, actorID 278 228
vanishes, which is not good...
Upvotes: 0
Views: 2351
Reputation: 95031
You can use TOP 1 WITH TIES
on a ROW_NUMBER
sort to get the records with the latest dates per customer only.
select
c.customerid,
c.name,
a.countrycode,
case when c.actorid in
(select * from agreement where activeagreement = 'Y' and product in ('6774', '6775'))
then 1 else 0 end as hasaccount,
i.controldate,
i.controlby
from customer c
left join agreement a on a.actorid = c.actorid
left join
(
select top 1 with ties *
from identification
order by row_number() over (partition by actorid order by controldate desc)
) i on i.actorid = c.actorid
where c.customerstatus = 'Active';
UPDATE: Above answer didn't work for the OP, so I offered the following two alternatives that did work:
left join
(
select
actorid, controlby, controldate,
max(controlby) over (partition by actorid) as max_controldate
from identification
) i on i.actorid = c.actorid and i.controldate = i.max_controldate.
and
left join
(
select *
from identification
qualify row_number() over (partition by actorid order by controldate desc) = 1)
) i on i.actorid = c.actorid. – Thorsten
The last option with QUALIFY
is the teradata way to do this. QUALIFY
is a teradata extension to the SQL standard. The other two approaches are standard SQL.
Upvotes: 1
Reputation: 1151
Probably the quickest solution would be to use ISNULL.
Where you write MAX(ControlDate)
extend it with MAX(ISNULL(ControlDate,'1970-01-01'))
(or any default date that you have)
That would replace the NULLs and make your query work.
I hope it helps. Peter
Upvotes: 0