modulitos
modulitos

Reputation: 15814

SQL: How to speed up a query using indexing

I am trying to speed up a query to find all CUSTOMERs who have bought a MOTORCYCLE manufactured before 1970 AND bought another MOTORCYCLE manufactured after 2010. Since my query is running very slowly, I think that I need help with finding the better indexes. My attempts are documented below:

Tables

CREATE TABLE CUSTOMER (
  id int PRIMARY KEY, 
  fname varchar(30),     
  lname varchar(30)
);

CREATE TABLE MOTORCYCLE (
  id int PRIMARY KEY, 
  name varchar(30), 
  year int -- Manufactured year
);

CREATE TABLE SALES (
  cid int,
  mid int,
  FOREIGN KEY(cid) REFERENCES CUSTOMER(id), 
  FOREIGN KEY(mid) REFERENCES MOTOCYCLE(id),
  PRIMARY KEY(pid, mid, role)
);

Indexes

Here are my indexes (I am somewhat guessing with these, but this was my attempt):

CREATE UNIQUE INDEX customerID on CUSTOMER(id);
CREATE INDEX customerName on CUSTOMER(fname, lname);

CREATE UNIQUE INDEX motorcycleID on MOTORCYCLE(id);
CREATE INDEX motorcycleName on MOTORCYCLE(name);
CREATE INDEX motorcycleYear on MOTORCYCLE(year);

CREATE INDEX salesCustomerMotorcycleID on SALES(cid, mid);       
CREATE INDEX salesCustomerID on SALES(cid);
CREATE INDEX castsMotorcycleID on SALES(mid);

Queries

My query to find the customers purchasing bikes manufactured before 1970 and after 2010 is here:

SELECT fname, lname
FROM (SALES INNER JOIN CUSTOMER ON SALES.cid=CUSTOMER.id) INNER JOIN MOTORCYCLE ON MOTORCYCLE.id=SALES.mid
GROUP BY CUSTOMER.id
HAVING MIN(MOTORCYCLE.year) < 1970 AND MAX(MOTORCYCLE.year) > 2010;

And here is another working query which avoids the GROUP BY and HAVING clauses:

SELECT DISTINCT C.id, fname, lname
FROM (CUSTOMER as C inner join (SALES as S1 INNER JOIN MOTORCYCLE as M1 ON M1.id=S1.mid) on C.id=S1.cid) inner join (SALES as S2 inner join MOTORCYCLE as M2 on S2.mid=M2.id) on C.id=S2.cid
WHERE (M1.year < 1970 AND M2.year > 2010);

Any suggestions on the kinds of indexes I can use to speed up my query? Or should I change my query?

UPDATE

I found another query that also works, but it is also too slow. It has been added above. Still, it might be helpful when finding an index to speed it up.

Upvotes: 2

Views: 354

Answers (2)

CL.
CL.

Reputation: 180020

When you check out your queries with EXPLAIN QUERY PLAN, you see that in both cases, the database looks up many related records before it filters out unneeded records (with unwanted years).

The following queries look up the motorcycle IDs before matching; which one is faster depends on the details of your data and must be measured by you:

SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
              FROM Sales
              WHERE cid = Customer.id
                AND mid IN (SELECT id
                            FROM Motorcycle
                            WHERE year < 1970))
  AND EXISTS (SELECT 1
              FROM Sales
              WHERE cid = Customer.id
                AND mid IN (SELECT id
                            FROM Motorcycle
                            WHERE year > 2010));

SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
              FROM Sales AS s1
              JOIN Sales AS s2 ON s1.cid = s2.cid
              WHERE s1.cid = Customer.id
                AND s1.mid IN (SELECT id
                               FROM Motorcycle
                               WHERE year < 1970)
                AND s2.mid IN (SELECT id
                               FROM Motorcycle
                               WHERE year > 2010));

SQL Fiddle

Upvotes: 1

Nir-Z
Nir-Z

Reputation: 869

Why using group by when there's no using of aggregation function in the query? Use distinct instead if you don't want to see any duplication

Upvotes: 0

Related Questions