SausageBuscuit
SausageBuscuit

Reputation: 1276

SQL - Oracle subquery filtering out data

I have a query in an Oracle database that I'm trying to figure out why it filters out data. This is something that a former employee used in a query to get certain customers. So I have data defined like this:

CUSTOMER_ID TYPE CHANGE_DATE
12356       T    12/1/2017
12356       T    6/1/2016
12356       T    11/11/2015
23452       T    7/1/2014
23452       T    7/1/2015
54674       T    1/1/2019
54674       T    11/1/2019
34653       T    12/1/2018
34653       T    1/1/2012

Now if I do a query to get just a basic max on it like this:

SELECT CUSTOMER_ID, "TYPE", MAX(CHANGE_DATE) FROM CUSTOMER_INFO WHERE TYPE = 'T'
GROUP BY CUSTOMER_ID, "TYPE"

it returns the max for each customer as expected.

CUSTOMER_ID TYPE CHANGE_DATE
12356       T    12/1/2017
23452       T    7/1/2015
54674       T    11/1/2019
34653       T    12/1/2018

However, the the query in question is structured like this, with a subselect to the same CUSTOMER_INFO table on the CHANGE_DATE field.

SELECT CUSTOMER_ID, "TYPE", CHANGE_DATE 
FROM CUSTOMER_INFO B 
WHERE "TYPE" = 'T' AND CHANGE_DATE = 
    (SELECT MAX(CHANGE_DATE) 
    FROM CUSTOMER_INFO 
    WHERE CUSTOMER_ID = B.CUSTOMER_ID)

This query cuts out half of the results, like this:

CUSTOMER_ID TYPE CHANGE_DATE
54674       T    11/1/2019
34653       T    12/1/2018

On the surface, I would have expected this to have a similar result to the first query since all of the items in CUSTOMER_INFO have a change date which could be the max. What is it about this change date comparison that is filtering out data in this way and why should it be used?

Upvotes: 0

Views: 703

Answers (2)

Andrew Sayer
Andrew Sayer

Reputation: 2336

SELECT CUSTOMER_ID, "TYPE", CHANGE_DATE 
FROM CUSTOMER_INFO B 
WHERE "TYPE" = 'T' AND CHANGE_DATE = 
    (SELECT MAX(CHANGE_DATE) 
    FROM CUSTOMER_INFO 
    WHERE CUSTOMER_ID = B.CUSTOMER_ID)

Is saying get the details for customers of type 'T' where their change_date is equal to the maximum change_date for any row for that customer (of any type). You'd also get all rows for that customer (of type 'T') that had the same change_date, rather than one row to represent them all (not sure what your primary key is here but potentially it would mean that this is not an actual concern).

Your simpler code does not worry that the maximum change_date could be from any type.

If you want to mimic the second query to require only one pass of your table, you could:

select * from 
(
select customer_id, type, change_date, rank() over (partition by customer_id order by change_date) rnk
from customer info 
)
where rnk = 1
and   type = 'T'

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269593

You need to filter in the subquery as well:

SELECT CUSTOMER_ID, "TYPE", CHANGE_DATE 
FROM CUSTOMER_INFO CI
WHERE "TYPE" = 'T' AND CHANGE_DATE = 
    (SELECT MAX(CI2.CHANGE_DATE) 
     FROM CUSTOMER_INFO CI2
     WHERE CI2."TYPE" = CI."TYPE" AND
           CI2.CUSTOMER_ID = CI.CUSTOMER_ID
    );

You can do this using just aggregation with KEEP:

SELECT CUSTOMER_ID,
       MAX("TYPE") KEEP (DENSE_RANK FIRST ORDER BY CHANGE_DATE DESC) as "TYPE",
       MAX(CHANGE_DATE)
FROM CUSTOMER_INFO WHERE TYPE = 'T'
GROUP BY CUSTOMER_ID

Upvotes: 1

Related Questions