Reputation: 19146
I have a many-to-many relationship between Books and Genres. For example "The Hobbit" Book may have the Genres "Kids", "Fiction" and "Fantasy".
Here's the schema:
CREATE TABLE "genre" (
"id" integer NOT NULL PRIMARY KEY,
"name" varchar(50) NOT NULL
)
;
CREATE TABLE "book_genres" (
"book_id" integer NOT NULL REFERENCES "book" ("id"),
"genre_id" integer NOT NULL REFERENCES "genre" ("id"),
CONSTRAINT book_genres_pkey PRIMARY KEY (book_id, genre_id)
)
;
CREATE TABLE "book" (
"id" integer NOT NULL PRIMARY KEY,
"name" varchar(255) NOT NULL,
"price" real NOT NULL
)
;
And the indexes:
CREATE INDEX "book_genres_36c249d7" ON "book_genres" ("book_id");
CREATE INDEX "book_genres_33e6008b" ON "book_genres" ("genre_id");
CREATE INDEX "book_5a5255da" ON "book" ("price");
Row counts:
I am trying to write a query in SQL which brings back all the Books for specific Genres ordered by price without duplicates.
Here's my query which does this:
SELECT name, price
FROM book
WHERE book.id
IN
(SELECT book_id
FROM book_genres
WHERE genre_id = 1
OR genre_id = 2)
ORDER BY price LIMIT 10
My problem is performance. This query can take up to 2000ms to execute. How can I improve the performance?
I have full control over the database (Postgres 9.3) so can add views, indexes or denormalise. I am also using Django so could perform multiple queries perform operations in memory using Python/Django.
Upvotes: 0
Views: 158
Reputation: 4503
SELECT b.name, b.price
FROM book b
WHERE EXISTS (
SELECT *
FROM book_genres bg
WHERE bg.book_id = b.id
AND bg.genre_id IN( 1 , 2)
)
ORDER BY b.price
LIMIT 10
;
The order by price+LIMIT can be a performance killer: check the query plan.
PLUS: replace the one-column indices by a "reversed" index: make book_id a FK into books.id and (maybe) omit the surrogate key id
CREATE TABLE book_genres
( book_id integer NOT NULL REFERENCES book (id)
, genre_id integer NOT NULL REFERENCES genre (id)
, PRIMARY KEY (book_id, genre_id)
) ;
CREATE INDEX ON book_genres (genre_id,book_id);
Upvotes: 3
Reputation: 44581
In most cases you can improve you performance using JOIN
instead of subqueries (Although it depends on many factors so ) :
SELECT *
FROM
(
SELECT b.name, b.price
FROM book b JOIN book_genres g ON b.book.id = g.book_id
AND g.genre_id = 1
UNION
SELECT b.name, b.price
FROM book b JOIN book_genres g ON b.book.id = g.book_id
AND g.genre_id = 2
)
ORDER BY price LIMIT 10
Upvotes: 2