Mert Malcok
Mert Malcok

Reputation: 43

Checking if a row exists is very slow in PostgreSQL

I have a product table and it has an indexed column manufacturer_id. The table has 30M rows. I want to check if a specific row exists in the table. Queries I used:

select exists(select 1 from product where manufacturer_id = '0');

select 1 from product where manufacturer_id = '0' fetch first row only;

select 1 from product where manufacturer_id = '0' limit 1;

Each query takes 4 minutes. There are 12M rows with manufacturer_id = '0' in the table. However, if I run the same query with manufacturer_id = '1234567' (which has 2 rows in the table), it returns the result in 250 milliseconds.

I would expect that checking row existence doesn't take long. Or, am I mistaken and it is normal to take that long given the table volume? If it is not normal, how could I improve the performance?

Execution plan:

Result  (cost=0.07..0.08 rows=1 width=1) (actual time=0.015..0.016 rows=1 loops=1)
  Buffers: shared hit=1
  InitPlan 1 (returns $0)
    ->  Seq Scan on product  (cost=0.00..914306.95 rows=12306868 width=0) (actual time=0.014..0.014 rows=1 loops=1)
          Filter: ((manufacturer_id)::text = '0'::text)
          Buffers: shared hit=1
Planning Time: 0.093 ms
Execution Time: 0.036 ms

Create Index statement: (this is the only index other than the primary key)

create index product_manufacturer_id
    on product (manufacturer_id);

manufacturer_id: varchar type and indexed column.

Postgres Version: 12

Upvotes: 1

Views: 2521

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270873

You want an index on manufacturer_id:

create index idx_product_manufacturer_id on product(manufacturer_id)

You may want additional keys after manufacturer_id to support other queries. But manufacturer_id needs to be the first column

Also, you need to be careful about types. So, you want the comparison to use the same type -- double quotes mean that the comparison is to a string not an integer. And even if they are both strings, the index may not be used if the collations are not the same.

Without an index, Postgres has to scan the table to find a match. Apparently, '1234567' shows up quickly in whatever order Postgres is scanning. But '0' does not.

An index fixes this problem. And it will also be helpful for joining to the manufacturers table.

Upvotes: 2

Related Questions