Optimizing bulk update with two inner joins and calculation

Question

So i have three tables: products, offers, and offer_lines

offer_lines is a join table to make an has and belongs to many relation:

products has_many offer_lines
offers has_many offer_lines

offer_lines has a column called calculated_price. This is based on the offers table's column discount_percentage and the products table's column price.

I want to create a sql statement that can calculate the discounted price of a product based on the discount percentage stored in the offers table.

This is what i got so far:

UPDATE offer_lines
SET discounted_price = (products.price - (products.price * offers.discount_percentage / 100))
FROM offer_lines AS o
  INNER JOIN products ON o.product_id = products.id
  INNER JOIN offers ON o.offer_id = offers.id
WHERE offer_lines.offer_id = 2;

This seems to work fine except for the fact that it takes about a minute to run.

Here is the explain:

Update on offer_lines  (cost=77.16..1131.37 rows=10670 width=87)
  ->  Hash Join  (cost=77.16..1131.37 rows=10670 width=87)
    Hash Cond: (o.product_id = products.id)
    ->  Hash Join  (cost=9.44..836.90 rows=10670 width=77)
          Hash Cond: (o.offer_id = offers.id)
          ->  Seq Scan on offer_lines o  (cost=0.00..620.74 rows=26674 width=14)
          ->  Hash  (cost=9.38..9.38 rows=4 width=71)
                ->  Nested Loop  (cost=0.29..9.38 rows=4 width=71)
                      ->  Index Scan using index_offer_lines_on_offer_id on offer_lines  (cost=0.29..8.30 rows=1 width=53)
                            Index Cond: (offer_id = 13)
                      ->  Seq Scan on offers  (cost=0.00..1.04 rows=4 width=18)
    ->  Hash  (cost=62.88..62.88 rows=388 width=18)
          ->  Seq Scan on products  (cost=0.00..62.88 rows=388 width=18)

Any ideas how i can make this run faster?

Gordon Linoff · Accepted Answer

First, the syntax for Postgres should not repeat the table being updated in the FROM clause:

UPDATE offer_lines ol
    SET discounted_price = (p.price - (p.price * o.discount_percentage / 100))
FROM products p, offers o
WHERE ol.offer_id = o.id AND
      ol.product_id = p.id AND
      ol.offer_id = 2;

(Wow, I can't believe I used a comma in the FROM clause. Arrrgh.)

Then, for this query, you want indexes on offer_lines(offer_id, product_id), products(id, price) (price is optional), and offers(id, discount_percentage) (discount_percentage is optional).

If I had to guess the performance problem is because order_lines was both in the update statement and the from clause.

EDIT:

I should be clear. You can repeat the table in the FROM clause. But it needs to be tied back to the version being updated:

UPDATE offer_lines ol
    SET discounted_price = (p.price - (p.price * o.discount_percentage / 100))
FROM offer_lines ol2 INNER JOIN
     products p
     ON ol2.product_id = p.id INNER JOIN
     offers o
     ON ol.offer_id = o.id
WHERE ol2.offer_id = 2 AND ol.id = o.id;

This assumes that offer_lines has a primary key column, which I've called id. Frankly, I can see the benefits of doing the update this way, in terms of readability.

Optimizing bulk update with two inner joins and calculation

Answers (1)

Related Questions