jmchle
jmchle

Reputation: 11

Simplifying Correlated Nested Subqueries

Can someone simplify how and why we use a Correlated Nested Subquery? I understand nested queries but can't seem to wrap my head around the logic for a correlated nested query.

Here is an example. I have two tables for a movie renting database.

Table Renting has information about renting a movie. Table Customers have information about the customers renting it.

Here is what I have for a Nested Query:

SELECT *
FROM CUSTOMERS
WHERE CUSTOMER_ID IN
-- nested to find customers with less than 5 rentals
    (SELECT CUSTOMER_ID
    FROM RENTING
    GROUP BY CUSTOMER_ID
    HAVING COUNT(*) < 5) 

This outputs 59 rows.

Here is the Correlated Nested Query:

-- Select customers with less than 5 movie rentals
SELECT *
FROM CUSTOMERS as c
WHERE 5 >
    (SELECT count(*)
    FROM renting as r
    WHERE r.customer_id = C.CUSTOMER_ID);

This outputs 66 rows.

If someone can simplify the logic. I was told the Nested Correlated Query is correct, but shouldn't my first query also yield the same result and logic. If so, why is the result different?

Thanks in advance.

Upvotes: 0

Views: 341

Answers (2)

Haleemur Ali
Haleemur Ali

Reputation: 28253

In the second query you are additionally selecting customers who have 0 records in the renting table.

here is a minimum example showing this behaviour:

create table customers (customer_id int);
insert into customers values (1), (2), (3), (4);
create table renting (customer_id int);
insert into renting values (1), (1), (1), (1), (1), (1);
insert into renting values (2), (2), (2);
insert into renting values (3);

your first subquery

SELECT CUSTOMER_ID
FROM RENTING
GROUP BY CUSTOMER_ID
HAVING COUNT(*) < 5

returns

 customer_id 
-------------
           3
           2
(2 rows)

while your second subquery returns is evaluated for each customer, and for customers who have 0 records in renting, the count(*) is evaluated to 0. example:

select count(*) from renting where customer_id = 4;
 count 
-------
     0
(1 row)

thus, the 2nd query includes customers 2, 3, 4, while the first query only includes customers 2 & 3

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269753

The difference is customers who have no rows in renting.

They are not in your in list, so they are not included in the first query.

For the second query, no rows match . . . but the correlated subquery still returns 0 because an aggregation query with no group by is guaranteed to return one row.

Upvotes: 1

Related Questions