Roko
Roko

Reputation: 1335

Using join prevents index use on where clause?

Is there a way to use both an index for join and a separate index for a different where clause? Whenever I add the join query- it causes the where index to be ignored.

I have Users table (80,000 records)

id company_id
1 4
2 4
3 5

with btree index on company_id: PRIMARY KEY, btree (id) Index btree on company_id

Comments table (7,000,000 records)

id user_id
11 2
22 2
33 3

with btree index on user_id:

If I use a simple where - it uses the company index:


    SELECT users.id
    FROM users
    WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
      16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
      16146, 16147, 16333)
    
> Index Scan using index_users_on_company_id on users  (cost=0.29..213.88 rows=97 width=4)
  Index Cond: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[]))

But when I add, although it uses user_id index at comments, it stopped using the company_id index (Parallel Seq Scan instead of previously used Index Scan):

SELECT users.id
FROM users INNER JOIN comments ON comments.user_id = users.id
WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333)

Gather  (cost=1000.43..9632.88 rows=1410 width=4)
  Workers Planned: 1
  ->  Nested Loop  (cost=0.43..8491.88 rows=829 width=4)
        ->  Parallel Seq Scan on users  (cost=0.00..2742.06 rows=57 width=4)
              Filter: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[]))
        ->  Index Only Scan using index_comments_on_user_id on comments  (cost=0.43..98.21 rows=266 width=4)
              Index Cond: (user_id = users.id)

To summarize:

Is there a way to use both the company_id index and the user_id index in a single query?

Basically substitute the Parallel Seq Scan on users with Index Scan using index_users_on_company_id on users at the second query?

UPDATE:

After setting max_parallel_workers_per_gather to 0, as @Anton Grig suggested, The same query, uses index instead of Parallel Seq Scan It does performs faster. I wonder if there's a less intrusive way..

Nested Loop  (cost=0.72..9998.67 rows=1410 width=4)
  ->  Index Scan using index_users_on_company_id on users  (cost=0.29..213.88 rows=97 width=4)
        Index Cond: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[]))
  ->  Index Only Scan using index_comments_on_user_id on comments  (cost=0.43..98.21 rows=266 width=4)
        Index Cond: (user_id = users.id)

UPDATE2:

I'll add the full query (uses left outer join) I'm running, not a minified version for question

SELECT COUNT(DISTINCT comments.tag_id) FILTER (WHERE comments.archived = FALSE ) AS tags_count
FROM users LEFT OUTER JOIN comments ON comments.user_id = users.id
WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333)
GROUP BY users.id

Upvotes: 0

Views: 95

Answers (1)

Anton Grig
Anton Grig

Reputation: 1719

If using max_parallel_workers_per_gather seems intrusive, try changing the query while maintaining logic. I would try something like this:

SELECT user_id
FROM comments 
WHERE user_id In (SELECT id
                  FROM users
                  WHERE company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
                                       16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
                                       16146, 16147, 16333))

In my case, it still uses Index Cond and is executed in about the same time.

I put it in the answer block to ease the perception of the query.

UPDATE:

If the selectivity of column user_id in table comments is low, I would suggest using Exist operator as follows:

Select id
From users
Where Exists (Select user_id From comments Where user_id=users.id) And
      company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
                     16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
                     16146, 16147, 16333)

Note: It also removes duplicates from the output.

UPDATE2:

I would create another index in table User instead of the previous one to provide Index Scan only -> Index Cond.

CREATE INDEX idx ON Users(company_id, id);

and I would change the query as follows:

Select id, 
       (Select COUNT(DISTINCT comments.tag_id)
        From comments
        Where comments.archived = FALSE And user_id=users.id) As tags_count
From users
Where company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
                     16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
                     16146, 16147, 16333)

Upvotes: 1

Related Questions