tkhuynh
tkhuynh

Reputation: 1019

PostgreSQL ON vs WHERE when joining tables?

I have 2 tables customer and coupons, a customer may or may not have a reward_id assigned to, so it's a nullable column. A customer can have many coupons and coupon belongs to a customer.

+-------------+------------+
|   coupons   | customers  |
+-------------+------------+
| id          | id         |
| customer_id | first_name |
| code        | reward_id  |
+-------------+------------+
customer_id column is indexed

I would like to make a join between 2 tables.

My attempt is:

select c.*, cust.id as cust_id, cust.first_name as cust_name
from coupons c
join customer cust
on c.customer_id = cust.id and cust.reward_id is not null

However, I think there isn't an index on reward_id, so I should move cust.reward_id is not null in where clause:

select c.*, cust.id as cust_id, cust.first_name as cust_name
from coupons c
join customer cust
on c.customer_id = cust.id
where cust.reward_id is not null

I wonder if the second attempt would be more efficient than the first attempt.

Upvotes: 4

Views: 3975

Answers (2)

Jacek Trociński
Jacek Trociński

Reputation: 920

In a PostgreSQL inner join, whether a filter condition is placed in the ON clause or the WHERE clause does not impact a query result or performance.

Here is a guide that explores this topic in more detail: https://app.pluralsight.com/guides/using-on-versus-where-clauses-to-combine-and-filter-data-in-postgresql-joins

Upvotes: 1

Kamil Gosciminski
Kamil Gosciminski

Reputation: 17147

It would be better if you see the execution plan on your own. Add EXPLAIN ANALYZE before your select statement and execute both to see the differences.

Here's how:

EXPLAIN ANALYZE select ...

What it does? It actually executes the select statement and gives you back the execution plan which was chosen by query optimizer. Without ANALYZE keyword it would only estimate the execution plan without actually executing the statement in the background.

Database won't use two indexes at one time, so having an index on customer(id) will make it unable to use index on customer(reward_id). This condition will actually be treated as a filter condition which is correct behaviour.

You could experiment with performance of a partial index created as such: customer(id) where reward_id is not null. This would decrease index size as it would only store these customer id's for which there is a reward_id assigned.

I generally like to split the relationship/join logic from conditions applied and I myself put them within the WHERE clause because it's more visible in there and easier to read for future if there are any more changes.

I suggest you see for yourself the possible performance gain, because it depends on how much data there is and the possible low cardinality for reward_id. For example if most rows have this column filled with a value it wouldn't make that much of a difference as the index size (normal vs partial) would be almost the same.

Upvotes: 1

Related Questions