tws
tws

Reputation: 707

Postgresql OR conditions with an empty subquery

How can I optimize a query whose WHERE conditions include a check for user_id = X OR user_id IN (some subquery that might return no results)

In my example below, queries 1 and 2 are both extremely fast (< 1 ms), but query 3, which is simply an OR of the conditions in queries 1 and 2, is much slower (50 ms)

Can somebody please explain why query 3 is so slow, and in general what types of query optimization strategies should I be pursuing to avoid this problem? I realize the subquery in my example could easily be eliminated, but in real life sometimes subqueries seem like the least complicated way to get the data I want.

relevant code and data:

posts data https://dl.dropbox.com/u/4597000/StackOverflow/sanitized_posts.csv

users data https://dl.dropbox.com/u/4597000/StackOverflow/sanitized_users.csv

# from the shell:
# > createdb test

CREATE TABLE posts (
  id integer PRIMARY KEY NOT NULL,
  created_by_id integer NOT NULL,
  created_at integer NOT NULL
);
CREATE INDEX index_posts ON posts (created_by_id, created_at);
CREATE INDEX index_posts_2 ON posts (created_at);

CREATE TABLE users (
  id integer PRIMARY KEY NOT NULL,
  login varchar(50) NOT NULL
);
CREATE INDEX index_users ON users (login);

COPY posts FROM '/path/to/sanitized_posts.csv' DELIMITERS ',' CSV;
COPY users FROM '/path/to/sanitized_users.csv' DELIMITERS ',' CSV;


-- queries:

-- query 1, fast:
EXPLAIN ANALYZE SELECT * FROM posts WHERE created_by_id = 123 LIMIT 100;

-- query 2, fast:
EXPLAIN ANALYZE SELECT * FROM posts WHERE created_by_id IN (SELECT id FROM users WHERE login = 'nobodyhasthislogin') LIMIT 100;

-- query 3, slow:
EXPLAIN ANALYZE SELECT * FROM posts WHERE created_by_id = 123 OR created_by_id IN (SELECT id FROM users WHERE login = 'nobodyhasthislogin') LIMIT 100;

Upvotes: 2

Views: 4640

Answers (3)

Scott S
Scott S

Reputation: 2746

Most of the time in this particular query is related to an index scan. Here is a query goes at it from a different angle to avoid this, but should return equivalent results.

SELECT posts.* FROM users JOIN posts on posts.created_by_id=users.id WHERE users.id=123 or login='nobodyhasthislogin'

This selects from the users table, doing the filter once, and then joins posts onto that.

I realize that the question is about tips for optimization, not really this specific query. To answer that, my suggestion is to run EXPLAIN ANALYZE and read up on interpreting the results, - this answer was helpful to me.

Upvotes: 0

David Aldridge
David Aldridge

Reputation: 52336

How about:

EXPLAIN ANALYZE
SELECT  * 
FROM    posts
WHERE   created_by_id IN (
           SELECT 123
           UNION ALL
           SELECT id FROM
           users  WHERE
           login  = 'nobodyhasthislogin') LIMIT 100;

Upvotes: 0

Wojtas
Wojtas

Reputation: 2354

Split the query (edited):

SELECT * FROM (
    SELECT * FROM posts p WHERE p.created_by_id = 123 
    union    
    SELECT * FROM posts p
    WHERE 
      EXISTS ( SELECT TRUE FROM users WHERE id = p.created_by_id AND login = 'nobodyhasthislogin')
  ) p
  LIMIT 100;

Upvotes: 2

Related Questions