Pramod Shinde
Pramod Shinde

Reputation: 1892

How can we optimise NOT IN sub query on same table?

I have following query which is filtering results using sub queries with NOT IN how can I avoid NOT IN? Is there any alternative for NOT IN? How can I optimise overall query?

One option I see adding index on lower(display_name) but Is the any better or further optimisation I can do here?

SELECT  table_name.* FROM table_name 
WHERE (
        table_name.id IN (
        SELECT distinct on (name, mode, formulation, strength, generic_name) id FROM table_name 
        where lower(display_name) 
          NOT IN(SELECT lower(display_name) FROM table_name WHERE table_name.user_id IS NULL)
      ) 
ORDER BY table_name.created_at DESC LIMIT 100 OFFSET 0

Query Plan

------------------------------------------------------------------------------------------------
 Limit  (cost=9277.38..9277.38 rows=1 width=296)
   ->  Sort  (cost=9277.38..9351.72 rows=29736 width=296)
         Sort Key: table_name.created_at DESC
         ->  Nested Loop  (cost=8015.41..9128.70 rows=29736 width=296)
               ->  HashAggregate  (cost=8015.12..8017.12 rows=200 width=4)
                     Group Key: table_name_1.id
                     ->  Unique  (cost=7486.65..7932.69 rows=6594 width=93)
                           ->  Sort  (cost=7486.65..7560.99 rows=29736 width=93)
                                 Sort Key: table_name_1.name, table_name_1.mode, table_name_1.formulation, table_name_1.strength, table_name_1.generic_name
                                 ->  Seq Scan on table_name table_name_1  (cost=2286.19..5277.28 rows=29736 width=93)
                                       Filter: (NOT (hashed SubPlan 1))
                                       SubPlan 1
                                         ->  Bitmap Heap Scan on table_name table_name_2  (cost=106.16..2272.68 rows=5402 width=32)
                                               Recheck Cond: (user_id IS NULL)
                                               ->  Bitmap Index Scan on index_table_name_on_user_id  (cost=0.00..104.81 rows=5402 width=0)
                                                     Index Cond: (user_id IS NULL)
               ->  Index Scan using table_name_pkey on table_name  (cost=0.29..6.61 rows=1 width=296)
                     Index Cond: (id = table_name_1.id)

What I am trying to do is table_name is drugs which holds two list like

  1. System wide drugs like added by admins in this case we keep user_id as null.
  2. User specific drugs means added by user with user_id value

I am returning two results in search 1. admin drugs 2. all users drugs like

Admin list

admin_list = select * from table_name user_id IS NULL and LOWER(table_name.name) like %cal%

All display names from admin list which need to skip in following query

SELECT lower(display_name) FROM table_name WHERE table_name.user_id IS NULL

All users drugs but skipping / filtering out admin list based on display_name like

user_list = SELECT  table_name.* FROM table_name 
WHERE (
        table_name.id IN (
        SELECT distinct on (name, mode, formulation, strength, generic_name) id FROM table_name 
        where lower(display_name) 
          NOT IN(SELECT lower(display_name) FROM table_name WHERE table_name.user_id IS NULL)
      ) 
AND LOWER(table_name.name) LIKE '%cal%'
ORDER BY table_name.created_at DESC LIMIT 100 OFFSET 0

Goal to have two distinct lists admin_list and user_list where user_list should not contain any admin list display_name

Upvotes: 1

Views: 112

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You seem to want one id per name, mode, formulation, strength, generic_name combination where the display_name has no null user_id.

If that is correct, I'm sure there are simpler ways to write the query:

select t.*
from (select distinct on (name, mode, formulation, strength, generic_name) t.* 
      from (select t.*,
                   count(*) filter (where user_id is null) over (partition by lower(display_name)) as cnt
            from table_name
           ) t
      where cnt = 0
      order by name, mode, formulation, strength, generic_name, created_at desc
     ) t
order by created_at desc
limit 100 offset 0

Upvotes: 0

user330315
user330315

Reputation:

More often than not, NOT EXISTS if faster than NOT IN

SELECT t1.* 
FROM table_name t1
WHERE table_name.id IN (SELECT distinct on (t2.name, t2.mode, t2.formulation, t2.strength, t2.generic_name) t2.id 
                        FROM table_name t2
                        where not exists (select * 
                                          FROM table_name t3
                                          WHERE t3.user_id IS NULL
                                            and t2.lower(display_name) = t3.lower(display_name) )
                       ) 
ORDER BY table_name.created_at DESC 
LIMIT 100 OFFSET 0

An index on table_name ( (lower(display_name)) ) where user_is null should improve that even more.

Note that distinct on () without an ORDER BY is typically not a good idea.

Upvotes: 2

Related Questions