Richard
Richard

Reputation: 65510

Postgres: How to JOIN

I am working on Postgres 9.3. I have two tables, the first for payment items:

                   Table "public.prescription"
      Column       |          Type           |                             Modifiers
-------------------+-------------------------+--------------------------------------------------------------------
 id                | integer                 | not null default nextval('frontend_prescription_id_seq'::regclass)
 presentation_code | character varying(15)   | not null
 presentation_name | character varying(1000) | not null
 actual_cost       | double precision        | not null
 pct_id            | character varying(3)    | not null

And the second for organisations:

           Table "public.pct"
      Column       |          Type           | Modifiers
-------------------+-------------------------+-----------
 code              | character varying(3)    | not null
 name              | character varying(200)  |

I have a query to get all the payments for a particular code:

SELECT sum(actual_cost) as total_cost, pct_id as row_id 
FROM prescription 
WHERE presentation_code='1234' GROUP BY pct_id

Here is the query plan for that query.

Now, I'd like to annotate each row with the name property of the associated organisation. This is what I'm trying:

SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription, pct
WHERE prescription.presentation_code='0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;

Here's the ANALYSE for that query. It's incredibly slow: what am I doing wrong?

I think there must be a way to annotate each row with the pct.name AFTER the first query has run, which would be faster.

Upvotes: 0

Views: 99

Answers (2)

Patrick
Patrick

Reputation: 32161

You are taking data from 2 tables, but you do not join the tables in any way. Effectively, you make a full join, resulting in the Cartesian product of both tables. If you look at your ANALYZE statistics, you see that your nested loop processes 62 million rows. that takes time.

Add in a join condition to make this all fast:

SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription
JOIN pct On pct.code = prescription.pct_id
WHERE prescription.presentation_code = '0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;

Upvotes: 0

fa44
fa44

Reputation: 420

With JOIN (LEFT JOIN in this case, because we want the line even if there is no pct):

SELECT 
    sum(prescription.actual_cost) as total_cost,
    prescription.pct_id,
    pct.name as row_id
FROM prescription
    LEFT JOIN pct ON pct.code = prescription.pct_id
WHERE 
    prescription.presentation_code='0212000AAAAAAAA'
GROUP BY 
    prescription.pct_id,
    pct.name;

I don't know if it's work well, I didn't try this query.

Upvotes: 2

Related Questions