stepanian
stepanian

Reputation: 11433

More performant update in MySQL than setting a boolean to an in clause

Let's assume I have a customer table that has a customer_id column and a has_orders column among others. Let's also assume I have an order table that has a customer_id column referencing customers in the customer table.

If I want to update the has_orders column, I can run the following statement:

update customer set has_orders = customer_id in (select customer_id from order)

This works. However, it's slow, even though I have indexes on customer_id in both tables. Is there another update statement (e.g. using joins) that can be faster?

Upvotes: 0

Views: 78

Answers (4)

Bohemian
Bohemian

Reputation: 425073

If has_orders is defined as boolean not null default false (a good choice), and orders can never be deleted (likely), update using a join:

update orders o
join customer c on c.id = o.customer_id
set has_orders = true

See live demo.

If order can be deleted (unlikely), update using an outer join, although this will be slower:

update customer c
left join `order` o on o.customer_id = c.id
set has_orders = o.customer_id is not null

See live demo.


Side note: If possible, prefer not naming tables/columns with reserved words, such as order, because it can require inconvenient delimiting with back-ticks to avoid syntax errors.

Upvotes: 1

Rick James
Rick James

Reputation: 142296

A religious war:

  • The DB purist would point out that has_order is redundant info, hence a no-no.

  • The performance fanatic would insist on having (and maintaining) has_order

I bring up that debate, not to wage a war, but to point out that getting rid of the flag may be better overall. When you need the flag, simply do a LEFT JOIN or AND EXISTS(SELECT..) or some other dynamic test.

Upvotes: 0

asd-tm
asd-tm

Reputation: 5263

Are you really sure you need to update the customer table asynchronously? I guess it might be a poor practice as the integrity of the data can be lost. Moreover this approach breakes normal form idea as the data is duplicated. It seems that a much better way is to retreive the value of has_orders selecting the flag (using exists statement proposed by @Tim-Biegeleisen or any other) from order table on demand and joining it to the dataset from customers table.

If the excess column is inevitable i would update it any time upon inserting/updating/deleting the order table and using affected customer id only. This can be done in a trigger. It will be rather fast.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521457

An index on customer_id in the order table should already be pretty optimal. You could write your update using exists instead:

UPDATE customer c
SET has_orders = EXISTS (SELECT 1 FROM order o WHERE o.customer_id = c.customer_id);

The possible advantage of the above versus using IN is that EXISTS will stop as soon as it finds a match.

Upvotes: 1

Related Questions