Digil
Digil

Reputation: 69

PSQL Optimize left join

I have 2 tables import_data and api_request_record defined as follows

CREATE TABLE public.import_data (
    id uuid NOT NULL,
    company_id integer NOT NULL,
    business_name character varying,
    no_of_rating double precision,
    no_of_reviews double precision,
    phone character varying,
    email text,
    state character varying,
    city character varying,
    zip character varying,
    address character varying,
    category character varying,
    upserted_by integer,
    upsert_date timestamp without time zone,
    vendor character varying,
    purchased boolean
)
PARTITION BY LIST (company_id);

CREATE TABLE import_data_1 PARTITION OF import_data FOR VALUES IN ('1');

import_data holds the data about SME businesses. Almost all fields are indexed. The table is partitioned.

CREATE TABLE public.api_request_record (
    id integer NOT NULL,
    company_id integer NOT NULL,
    create_date timestamp without time zone,
    data_record_id uuid NOT NULL,
    phone character varying,
    valid_number boolean,
    validity_check_date timestamp without time zone,
    phone_location character varying,
    carrier character varying,
    line_type character varying,
    dnc_compliant boolean,
    dnc_check_date timestamp without time zone,
    lc_compliant boolean,
    lc_check_date timestamp without time zone
)
PARTITION BY LIST (company_id);

ALTER TABLE public.api_request_record
    ADD CONSTRAINT api_request_record_data_record_id_company_id_fkey FOREIGN KEY (data_record_id, company_id) REFERENCES public.import_data(id, company_id) ON DELETE CASCADE;

CREATE TABLE api_request_record_1 PARTITION OF api_request_record FOR VALUES IN ('1');

api_request_record holds data about API request info of the import_data records for each company. Most of the fields are indexed. It is also partitioned.

Now I'm trying to select some info about records. My SQL query goes as

SELECT
    import_data.id,
    import_data.phone,
    import_data.business_name,
    import_data.address,
    api_request_record.lc_compliant,
    api_request_record.dnc_compliant
FROM
    import_data
    LEFT JOIN (
        SELECT
            *
        FROM
            api_request_record
        WHERE
            company_id = 1
            AND (api_request_record.dnc_compliant = TRUE
                OR api_request_record.lc_compliant = TRUE)
            AND (api_request_record.dnc_compliant = TRUE)) AS api_request_record ON api_request_record.data_record_id = import_data.id
WHERE
    import_data.company_id = 1
    AND (api_request_record.dnc_compliant = TRUE
        OR api_request_record.lc_compliant = TRUE)
    AND (api_request_record.dnc_compliant = TRUE)
ORDER BY
    import_data.phone DESC
LIMIT 100 OFFSET 600;

The problem is that in order to 100 records from this query it takes about 30 seconds when the number of records in both tables is about 1.5 million records. And it increases as the offset increases.

I'm working on

Here is the EXPLAIN (analyze, buffers, format text) https://explain.depesz.com/s/AQOZ

Is there a way to optimize this query, so that the query completes in less time? Thanks in advance.

Upvotes: 0

Views: 59

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You seem to want:

Your query can be radically simplified:

  • The outer join is really an inner join due to the WHERE clause.
  • The boolean conditions are much more complicated than necessary ( "(A OR B) AND A" is really just "A").
  • No subquery is necessary.
  • The JOIN conditions can be expanded.

So:

SELECT id.id, id.phone, id.business_name, id.address,
       arr.lc_compliant, arr_record.dnc_compliant
FROM import_data id JOIN
     api_request_record arr
     ON arr.company_id = id.company_id AND
        arr.data_record_id = id.id
WHERE id.company_id = 1 AND
      arr.dnc_compliant
ORDER BY id.phone DESC
LIMIT 100 OFFSET 600;

Then for this query, I would recommend the following indexes:

  • import_data(company_id, id)
  • api_request_record(company_id, data_record_id, dnc_compliant)

Note that these are indexes with multiple keys, not separate indexes on each key.

Upvotes: 1

Philippe
Philippe

Reputation: 1827

Try something like this

SELECT
    import_data.id,
    import_data.phone,
    import_data.business_name,
    import_data.address,
    api_request_record.lc_compliant,
    api_request_record.dnc_compliant
FROM
    import_data
    LEFT JOIN api_request_record on api_request_record.data_record_id = import_data.id AND (api_request_record.dnc_compliant = TRUE OR api_request_record.lc_compliant = TRUE)
WHERE
    import_data.company_id = 1
ORDER BY
    import_data.phone DESC
LIMIT 100 OFFSET 600;

Upvotes: 0

Related Questions