Reputation: 19621
While there are similarly named questions, I haven't found any that present an equivalent scenario.
I'm having trouble getting JOIN to work well when I add a WHERE clause that uses a column in the adjoining table. For example:
SELECT a.*, b.unixTimestamp
FROM alpha a
LEFT JOIN beta b ON a.id = b.id
WHERE a.categoryId IN (1, 2, 3) AND b.unixTimestamp >= ?
It's a bit more complex than that in practice, but the general form is the same. This query takes about 1.5 seconds. However, if I remove AND b.unixTimestamp >= ?
from the query, it runs in about 1 millisecond.
For the sake of science, I've added three unique keys on beta
:
PRIMARY (id)
one (id, unixTimestamp)
two (unixTimestamp, id)
EXPLAIN shows that MySQL chooses the primary key, bu I expected it to use one
. Providing a key hint doesn't seem to make a difference in performance.
How can I use fields from two joined tables in a WHERE clause?
Note: I cannot easily move the test on unixTimestamp
to the ON
clause because of the way the framework I'm using is built. If I'm forced to do that, it would have to be complex at times because the unixTimestamp
test isn't necessarily the same for every categoryId
; for example:
ON a.id = b.id AND (
(a.categoryId IN (1, 2) AND b.unixTimestamp >= ?)
OR (a.categoryId IN (3, 4) AND b.unixTimestamp >= ?)
OR (a.categoryId IN (5, 6) AND b.unixTimestamp >= ?)
)
Update:
Looks like an ON clause doesn't work any better. At first I thought it did, but I had forgotten to change it to an inner join. Unfortunately, there are other complexities in the actual query that require me to add multiple columns from alpha
to the ON clause. I still can't seem to get the indices to line up. For example, this takes about 1.5 seconds:
SELECT alpha.*, b.unixTimestamp
FROM alpha a
INNER JOIN beta b ON (a.id = b.id AND (b.unixTimestamp >= ? or b.userId = ?))
WHERE a.categoryId IN (1, 2, 3)
There's really no way I can avoid columns from beta in the WHERE clause and multiple columns from alpha in the ON clause; it's either one or the other.
Upvotes: 0
Views: 53
Reputation: 6112
I'm not sure exactly how much flexibility you have in the query (being unfamiliar with XenForo), but you could try using a derived table with an index hint (though one would expect a hint to be unnecessary) to enforce filtering of your beta table results before being joined:
SELECT a.*, b.unixTimestamp
FROM alpha a
LEFT JOIN (
SELECT id, unixTimestamp
FROM beta
USE INDEX (two)
WHERE unixTimestamp >= ?
) b ON a.id = b.id
WHERE a.categoryId IN (1, 2, 3)
When materializing the derived table, there's no joins in the subquery, so there shouldn't be any issue with using the index on the timestamp column. Adding an index on unixTimestamp only (and deleting indexes one and two if they're otherwise unnecessary) may improve this query, but of course that might not be an option in your actual DB. But if the amount of data fetched from beta is too big to hold in memory, this query will not be performant, because the derived table will get pushed to disk.
The optimizer ought to add an index to the derived table on id, given that it will analyze the alpha table first (in order to determine whether it needs to materialize the derived table), and so should be able to tell that the join to the derived table is ref-eligible (as it would be an equality comparison of two indexed columns). Thus your join performance shouldn't be destroyed by using a derived table here. See 8.2.1.18.3.
On the other hand, the non-usage of the timestamp indexes might be because they simply are not useful, so the optimizer is making the right call. Maybe your chosen timestamp value for the where clause - or the timestamp + ID data itself - isn't selective. Optimizers are usually much better than humans at making these decisions, so it's difficult to tell without more, extremely detailed, information.
As an aside, you might want to consider optimizing that IN clause, if possible, by rewriting it using EXISTS. Take a peek at 8.2.1.18.4 for ideas.
Upvotes: 1