Reputation: 1335
Is there a way to use both an index for join
and a separate index for a different where
clause?
Whenever I add the join
query- it causes the where
index to be ignored.
I have Users table (80,000 records)
id | company_id |
---|---|
1 | 4 |
2 | 4 |
3 | 5 |
with btree index on company_id:
PRIMARY KEY, btree (id)
Index btree on company_id
Comments table (7,000,000 records)
id | user_id |
---|---|
11 | 2 |
22 | 2 |
33 | 3 |
with btree index on user_id:
PRIMARY KEY, btree (id)
Index btree on user_id
If I use a simple where - it uses the company index:
SELECT users.id
FROM users
WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
16146, 16147, 16333)
> Index Scan using index_users_on_company_id on users (cost=0.29..213.88 rows=97 width=4)
Index Cond: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[]))
But when I add, although it uses user_id index at comments, it stopped using the company_id index (Parallel Seq Scan
instead of previously used Index Scan
):
SELECT users.id
FROM users INNER JOIN comments ON comments.user_id = users.id
WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333)
Gather (cost=1000.43..9632.88 rows=1410 width=4)
Workers Planned: 1
-> Nested Loop (cost=0.43..8491.88 rows=829 width=4)
-> Parallel Seq Scan on users (cost=0.00..2742.06 rows=57 width=4)
Filter: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[]))
-> Index Only Scan using index_comments_on_user_id on comments (cost=0.43..98.21 rows=266 width=4)
Index Cond: (user_id = users.id)
To summarize:
Is there a way to use both the company_id index and the user_id index in a single query?
Basically substitute the Parallel Seq Scan on users
with Index Scan using index_users_on_company_id on users
at the second query?
UPDATE:
After setting max_parallel_workers_per_gather
to 0, as @Anton Grig suggested, The same query, uses index instead of Parallel Seq Scan
It does performs faster. I wonder if there's a less intrusive way..
Nested Loop (cost=0.72..9998.67 rows=1410 width=4)
-> Index Scan using index_users_on_company_id on users (cost=0.29..213.88 rows=97 width=4)
Index Cond: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[]))
-> Index Only Scan using index_comments_on_user_id on comments (cost=0.43..98.21 rows=266 width=4)
Index Cond: (user_id = users.id)
UPDATE2:
I'll add the full query (uses left outer join
) I'm running, not a minified version for question
SELECT COUNT(DISTINCT comments.tag_id) FILTER (WHERE comments.archived = FALSE ) AS tags_count
FROM users LEFT OUTER JOIN comments ON comments.user_id = users.id
WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333)
GROUP BY users.id
Upvotes: 0
Views: 95
Reputation: 1719
If using max_parallel_workers_per_gather seems intrusive, try changing the query while maintaining logic. I would try something like this:
SELECT user_id
FROM comments
WHERE user_id In (SELECT id
FROM users
WHERE company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
16146, 16147, 16333))
In my case, it still uses Index Cond
and is executed in about the same time.
I put it in the answer block to ease the perception of the query.
UPDATE:
If the selectivity of column user_id in table comments is low, I would suggest using Exist
operator as follows:
Select id
From users
Where Exists (Select user_id From comments Where user_id=users.id) And
company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
16146, 16147, 16333)
Note: It also removes duplicates from the output.
UPDATE2:
I would create another index in table User
instead of the previous one to provide Index Scan only -> Index Cond
.
CREATE INDEX idx ON Users(company_id, id);
and I would change the query as follows:
Select id,
(Select COUNT(DISTINCT comments.tag_id)
From comments
Where comments.archived = FALSE And user_id=users.id) As tags_count
From users
Where company_id IN (16111, 16198, 16113, 16133, 16135, 16134,
16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145,
16146, 16147, 16333)
Upvotes: 1