Reputation: 5897
I have a database table that stores imported information. For simplicity, its something like:
CREATE TABLE `data_import` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`amount` DECIMAL(12,2) NULL DEFAULT NULL,
`payee` VARCHAR(50) NULL DEFAULT NULL,
`posted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `payee` (`payee`)
)
I also have a table that stores import rules:
CREATE TABLE `import_rules` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`search` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `search` (`search`)
)
The idea is that for each imported transaction, the query needs to try find a single matching rule - this match is done on the data_import.payee and import_rules.seach fields. Because these are both varchar fields, I have indexed them in the hope of making the query faster.
This is what I have come up with so far, which seems to work fine. Albeit slower than I hoped.
SELECT i.id, i.payee, i.amount, i.posted r.id, r.search
FROM import_data id
LEFT JOIN import_rules ir on REPLACE(i.payee, ' ', '') = REPLACE(ir.search, ' ', '')
One thing that the above query does not cater for, is that if import_data.posted = 1, then I dont need to find a rule for that line - is it possible to stop the query joining on that particular row? Similarly, if the payee is null, then it shouldn't try join either.
Are there any other ways that I can optimise this? I realise that doing text joins is not ideal...not sure if there are any better methods.
Upvotes: 1
Views: 111
Reputation: 9926
I highly recommend doing anything you can to get rid of the REPLACE
s in that JOIN
. Using REPLACE
on both sides of the join totally eliminate the ability to use an index on either table.
Assuming you can get rid of the REPLACE
s (by cleansing the existing data and/or new data):
N
in VARCHAR(N)
as small
as you can as it will affect the side
of the index (or arguably, use index
prefixes).search
index on import_rules
UNIQUE
-- then you're sure to only
going to get 1 row result returned per row of
import_data
You can throw an AND
into your WHERE
clause if you'd like to enforce your 'don't join in this case' rule.
LEFT JOIN import_rules ir ON id.payee=ir.search AND id.posted != 1
Upvotes: 3
Reputation: 20612
The use of REPLACE() on the join is probably breaking the indexing, as it has an index of the values in the field, not the amended values after REPLACE().
As for not joining, you are already using a LEFT JOIN, so non-matching joins will result in NULLs for the import_rules fields; you should be able to add WHERE clauses to force that.
Upvotes: 2