Reputation: 42099
I'm having a slow start to the morning. I thought there was a more efficient way to make the following query using a join, instead of two independent selects -- am I wrong?
Keep in mind that I've simplified/reduced my query into this example for SO purposes, so let me know if you have any questions as well.
SELECT DISTINCT c.*
FROM customers c
WHERE c.customer_id IN (select customer_id from customers_cars where car_make = 'BMW')
AND c.customer_id IN (select customer_id from customers_cars where car_make = 'Ford')
;
-- Simple tables to demonstrate point
CREATE TABLE customers (
customer_id serial,
name text
);
CREATE TABLE customers_cars (
customer_id integer,
car_make text
);
-- Populate tables
INSERT INTO customers(name) VALUES
('Joe Dirt'),
('Penny Price'),
('Wooten Nagen'),
('Captain Planet')
;
INSERT INTO customers_cars(customer_id,car_make) VALUES
(1,'BMW'),
(1,'Merc'),
(1,'Ford'),
(2,'BMW'),
(2,'BMW'), -- Notice car_make is not unique
(2,'Ferrari'),
(2,'Porche'),
(3,'BMW'),
(3,'Ford');
-- ids 1 and 3 both have BMW and Ford
Upvotes: 2
Views: 101
Reputation: 1308
It seems to me that you are trying to find customers that has at least 1 BMW and at least 1 Ford car. This query should get that for you:
SELECT
customers.customer_id
FROM
customers
INNER JOIN customer_cars ON
customers.customer_id = customer_cars.customers_id
AND customer_cars.car_make IN ('BMW', 'Ford')
GROUP BY
customers.customer_id
HAVING
COUNT(CASE WHEN car_make = 'BMW' THEN 1 ELSE NULL END) > 0
AND COUNT(CASE WHEN car_make = 'Ford' THEN 1 ELSE NULL END) > 0
Make sure you have an indexes on customer_cars.customer_id and customer_cars.car_make to achieve maximum performance.
Upvotes: 1
Reputation: 34537
I would write it as
SELECT DISTINCT c.customer_id
FROM customers c
JOIN customers_cars cc_f on c.customer_id = cc_f.customer_id and cc_f.car_make = 'Ford'
JOIN customers_cars cc_b on c.customer_id = cc_b.customer_id and cc_b.car_make = 'BMW'
;
Whether this is better or not I don't know. In some RDBMs plain joins like this work better than subqueries, but I don't know about Postgres. From readability point of view it is also questionable.
Upvotes: 1
Reputation: 656431
You don't need to join to customers
at all (given relational integrity).
Generally, this is a case of relational division. We assembled an arsenal of techniques under this related question:
If (customer_id, car_make)
was defined unique in customers_cars
, it would get much simpler:
SELECT customer_id
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1
HAVING count(*) = 2;
Since (customer_id, car_make)
is not unique, we need an extra step.
For only a few cars, your original query is not that bad. But (especially with duplicates!) EXISTS
is typically faster than IN
, and we don't need the final DISTINCT
:
SELECT customer_id -- no DISTINCT needed.
FROM customers c
WHERE EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'BMW')
AND EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'Ford');
Above query gets verbose and less efficient for a longer list of cars. For an arbitrary number of cars I suggest:
SELECT customer_id
FROM (
SELECT customer_id, car_make
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1, 2
) sub
GROUP BY 1
HAVING count(*) = 2;
Upvotes: 0
Reputation: 126991
And here another option, don't know what the fastest one would be on large tables.
SELECT customers.*
FROM customers
JOIN customers_cars USING(customer_id)
WHERE car_make = ANY(ARRAY['BMW','Ford'])
GROUP BY
customer_id, name
HAVING array_agg(car_make) @> ARRAY['BMW','Ford'];
vol7ron: Fiddle
The following is a modification of the above, taking the same idea using an array for comparison. I'm not sure how any more efficient it would be compared to the dual-query approach, since it would have to create an array as one pass and then do more heavy-handed comparison because of comparing the elements of an array.
SELECT DISTINCT c.*
FROM customers c
WHERE customer_id IN (
select customer_id
from customers_cars
group by customer_id
having array_agg(car_make) @> ARRAY['BMW','Ford']
);
Upvotes: 2