Reputation: 75
I'm trying to create a query to retrieve all customers who purchased one SKU ('Red399') but did not also purchase a second SKU ('Red323'). It seems like the best way to do this is by using NOT EXISTS
with a subquery to filter out people who bought 'Red323'.
My query isn't returning any errors, but it also isn't returning any results, and I think it might be because I have too many conditions in the initial WHERE
clause but I'm not sure:
SELECT DISTINCT o."FirstName", o."LastName", o."Email", ol."SKU"
FROM flight_export_order o
JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red399'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
AND NOT EXISTS
(SELECT DISTINCT o."Email"
FROM flight_export_order o
JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red323'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020')
Upvotes: 1
Views: 81
Reputation: 656481
You were on the right track to begin with. EXISTS
/ NOT EXISTS
are the right tools:
SELECT o.* -- or just the columns you need
FROM flight_export_order o
WHERE o."OrderDate" BETWEEN '2020-07-22' AND '2020-08-03'
AND EXISTS (
SELECT FROM flight_export_orderline
WHERE "OrderDisplayID" = o."OrderDisplayID"
AND "SKU" = 'Red399'
)
AND NOT EXISTS (
SELECT FROM flight_export_orderline
WHERE "OrderDisplayID" = o."OrderDisplayID"
AND "SKU" = 'Red323'
);
With a multicolumn index on flight_export_orderline ("OrderDisplayID", "SKU")
, this is as fast as it gets. An index on just ("OrderDisplayID")
(like you probably have) goes a long way, too.
Plus an index on flight_export_order("OrderDate")
, obviously.
I see no need for any expensive aggregating or DISTINCT
. See:
Aside 1: try to avoid quoted CaMeL-case identifiers in Postgres if you can. See:
Aside 2: It's reommended to use ISO 8601 date format (YYYY-MM-DD
), which is always unambiguous and independent of locale and session settings.
Upvotes: 1
Reputation: 164089
You don't need the subquery.
You can group by customer and set the conditions in the having
clause:
SELECT o."FirstName", o."LastName", o."Email"
FROM flight_export_order o INNER JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" IN ('Red399', 'Red323')
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
GROUP BY o."FirstName", o."LastName", o."Email"
HAVING SUM((ol."SKU" = 'Red399')::int) > 0
AND SUM((ol."SKU" = 'Red323')::int) = 0
Upvotes: 1