Reputation: 359

Trying to Optimize PostgreSQL Nested WHERE IN

I have a Postgres (9.1) customer database similar to:

customers.id
customers.lastname
customers.firstname

invoices.id
invoices.customerid
invoices.total

invoicelines.id
invoicelines.invoiceid
invoicelines.itemcode
invoicelines.price

I built a search which lists all customers who have purchased a certain item (say 'abc').

Select * from customers WHERE customers.id IN
    (Select invoices.customerid FROM invoices WHERE invoices.id IN
        (Select invoicelines.invoiceid FROM invoicelines WHERE
        invoicelines.itemcode = 'abc')
    )

The search works fine and brings up the correct customers but takes about 10 seconds or so on a database of 2 million invoices and 2 million line items.

I was wondering if there was another approach that could trim that down a bit.

Upvotes: 1

Answers (3)

JCollerton

Reputation: 3327

Do you want all of the rows and columns in customer where the itemcode for that customer's item is 'abc'? If you join on the customerid then you can find all of the customer information for those items. If you have duplicates within that list you can use DISTINCT which will only give you one entry per customerID.

SELECT 
    DISTINCT [List of customer columns] 
FROM 
    customers 
INNER JOIN 
    invoicelines 
ON 
    customers.customerid = invoicelines.customerid
AND
    invoicelines.itemcode = 'abc'

Upvotes: 0

Gordon Linoff

Reputation: 1270081

You might switch to using exists instead. I suspect that this might work well:

Select c.*
from customers c 
where exists (Select 1
              from invoices i join
                   invoicelines il
                   on i.id = il.invoiceid and il.itemcode = 'abc'
              where c.id = i.customerid
             );

For this, you want to be sure you have the right indexes: invoices(customerid, id) and invoicelines(invoiceid, itemcode).

Upvotes: 1

Giorgos Betsos

Reputation: 72175

An alternative is to use EXISTS:

Select * 
from customers 
WHERE EXISTS (
   Select invoices.customerid 
   FROM invoices 
   JOIN invoicelines
      ON invoicelines.invoiceid = invoices.id AND
         invoicelines.itemcode = 'abc' AND
         customers.id = invoices.customerid)

Upvotes: 3

Trying to Optimize PostgreSQL Nested WHERE IN

Answers (3)

Related Questions