ekeckeisen
ekeckeisen

Reputation: 75

Using NOT EXISTS with a subquery

I'm trying to create a query to retrieve all customers who purchased one SKU ('Red399') but did not also purchase a second SKU ('Red323'). It seems like the best way to do this is by using NOT EXISTS with a subquery to filter out people who bought 'Red323'.

My query isn't returning any errors, but it also isn't returning any results, and I think it might be because I have too many conditions in the initial WHERE clause but I'm not sure:

SELECT DISTINCT o."FirstName", o."LastName", o."Email", ol."SKU"
FROM flight_export_order o
JOIN flight_export_orderline ol
    ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red399'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
AND NOT EXISTS 
(SELECT DISTINCT o."Email" 
    FROM flight_export_order o
    JOIN flight_export_orderline ol
        ON o."OrderDisplayID" = ol."OrderDisplayID"
    WHERE ol."SKU" = 'Red323'
    AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020')

Upvotes: 1

Views: 81

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656481

You were on the right track to begin with. EXISTS / NOT EXISTS are the right tools:

SELECT o.* --  or just the columns you need
FROM   flight_export_order o
WHERE  o."OrderDate" BETWEEN '2020-07-22' AND '2020-08-03'
AND    EXISTS (
   SELECT FROM flight_export_orderline
   WHERE  "OrderDisplayID" = o."OrderDisplayID"
   AND    "SKU" = 'Red399'
   )
AND    NOT EXISTS (
   SELECT FROM flight_export_orderline
   WHERE  "OrderDisplayID" = o."OrderDisplayID"
   AND    "SKU" = 'Red323'
   );

With a multicolumn index on flight_export_orderline ("OrderDisplayID", "SKU"), this is as fast as it gets. An index on just ("OrderDisplayID") (like you probably have) goes a long way, too.

Plus an index on flight_export_order("OrderDate"), obviously.

I see no need for any expensive aggregating or DISTINCT. See:

Aside 1: try to avoid quoted CaMeL-case identifiers in Postgres if you can. See:

Aside 2: It's reommended to use ISO 8601 date format (YYYY-MM-DD), which is always unambiguous and independent of locale and session settings.

Upvotes: 1

forpas
forpas

Reputation: 164089

You don't need the subquery.
You can group by customer and set the conditions in the having clause:

SELECT o."FirstName", o."LastName", o."Email"
FROM flight_export_order o INNER JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" IN ('Red399', 'Red323')
  AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
GROUP BY o."FirstName", o."LastName", o."Email"
HAVING SUM((ol."SKU" = 'Red399')::int) > 0
   AND SUM((ol."SKU" = 'Red323')::int) = 0

Upvotes: 1

Related Questions