Richard
Richard

Reputation: 65510

Speeding up a Postgres query on millions of rows?

I am using Postgres 9.4. I have a large table. This is the structure of my table:

 processing_date | date                 |
 practice_id     | character varying(6) |
 chemical_id     | character varying(9) |
 items           | bigint               |
 cost            | double precision     |
Indexes:
    "vw_idx_chem_by_practice_chem_id" btree (chemical_id)
    "vw_idx_chem_by_practice_chem_id_vc" btree (chemical_id varchar_pattern_ops)
    "vw_idx_chem_by_practice_joint_id" btree (practice_id, chemical_id)

Now I want to run a LIKE query on the table. This is my query:

EXPLAIN (ANALYSE, BUFFERS) SELECT sum(pr.cost) as actual_cost, 
      sum(pr.items) as items, pr.practice_id as row_id, 
      pc.name as row_name, pr.processing_date as date 
FROM vw_chemical_summary_by_practice pr 
JOIN frontend_practice pc ON pr.practice_id=pc.code 
WHERE (pr.chemical_id LIKE '0401%' ) 
GROUP BY pr.practice_id, pc.code, date 
ORDER BY date, pr.practice_id;

And this is the result of an EXPLAIN: http://explain.depesz.com/s/lYRT

As you can see, it's slow partly because it's running a bitmap heap scan on nearly 4 million rows. (The subsequent sort is also slow.)

Is there anything I can do to speed this up?

I'm wondering if I should create a further materialized view, or whether a multicolumn index would help, so that Postgres can look in the index rather than disk.

Is there some way I can also make the sort more efficient?

UPDATE: Here's the definition of the materialized view:

    CREATE MATERIALIZED VIEW vw_chemical_summary_by_practice
    AS SELECT processing_date, practice_id, chemical_id, 
    SUM(total_items) AS items, SUM(actual_cost) AS cost
    FROM frontend_prescription
    GROUP BY processing_date, practice_id, chemical_id

And the underlying table:

id                | integer                 | not null default nextval('frontend_prescription_id_seq'::regclass)
 presentation_code | character varying(15)   | not null
 total_items       | integer                 | not null
 actual_cost       | double precision        | not null
 processing_date   | date                    | not null
 practice_id       | character varying(6)    | not null
Indexes:
    "frontend_prescription_pkey" PRIMARY KEY, btree (id)
    "frontend_prescription_528f368c" btree (processing_date)
    "frontend_prescription_6ea07fe3" btree (practice_id)
    "frontend_prescription_idx_code" btree (presentation_code varchar_pattern_ops)
    "frontend_prescription_idx_date_and_code" btree (processing_date, presentation_code)

UPDATE 2: In case it wasn't clear, I need to get total spending and items by practice and by month, across all chemical_ids that start with `0401.

Upvotes: 2

Views: 2247

Answers (1)

joop
joop

Reputation: 4503

-- assuming this is your original table:
CREATE TABLE practice_chemical_old
    ( processing_date date NOT NULL
    , practice_id     character varying(6) NOT NULL
    , chemical_id     character varying(9) NOT NULL
    , items           bigint NOT NULL DEFAULT NULL
    , cost            double precision
    );

-- create these three new tables to decompose it into
CREATE TABLE practice
    ( practice_id SERIAL NOT NULL PRIMARY KEY
    , practice_name character varying(6) UNIQUE
    );
CREATE TABLE chemical
    ( chemical_id SERIAL NOT NULL PRIMARY KEY
    , chemical_name character varying(9) UNIQUE
    );

CREATE TABLE practice_chemical_new
    ( practice_id INTEGER NOT NULL REFERENCES practice (practice_id)
    , chemical_id INTEGER NOT NULL REFERENCES chemical (chemical_id)
    , processing_date date NOT NULL
    , items bigint NOT NULL default 0
    , cost double precision
            -- Not sure if processing_date should be part of the key, too
    , PRIMARY KEY (practice_id, chemical_id)
    );

CREATE UNIQUE INDEX ON practice_chemical_new(chemical_id, practice_id);

INSERT INTO practice(practice_name)
SELECT DISTINCT practice_id FROM practice_chemical_old;

INSERT INTO chemical(chemical_name)
SELECT DISTINCT chemical_id FROM practice_chemical_old;

-- now populate the new tables from the old ones ...
INSERT INTO practice_chemical_new(practice_id, chemical_id, processing_date,items,cost)
SELECT p.practice_id, c.chemical_id, pco.processing_date, pco.items, pco.cost
FROM practice_chemical_old pco
JOIN practice p ON p.practice_name = pco.practice_id
JOIN chemical c ON c.chemical_name = pco.chemical_id
    ;

-- Now,  the original table *could* be represented by the following view (or table, or table expression):
CREATE VIEW practice_chemical_fake AS
SELECT pcn.processing_date AS processing_date
    , p.practice_name AS practice_id
    , c.chemical_name AS chemical_id
    , pcn.items AS items
    , pcn.cost AS cost
FROM practice_chemical_new pcn
JOIN practice p ON p.practice_id = pcn.practice_id
JOIN chemical c ON c.chemical_id = pcn.chemical_id
    ;

Note: it is not clear from the original question if multiple instances of {practice,chemical} are possible (with different processing_date). You could need to change the definition of the PK a bit.

Upvotes: 3

Related Questions