JonoB
JonoB

Reputation: 5897

MySQL query optimisation

I have a database table that stores imported information. For simplicity, its something like:

CREATE TABLE `data_import` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`amount` DECIMAL(12,2) NULL DEFAULT NULL,
`payee` VARCHAR(50) NULL DEFAULT NULL,
`posted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `payee` (`payee`)
)

I also have a table that stores import rules:

CREATE TABLE `import_rules` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`search` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `search` (`search`)
)

The idea is that for each imported transaction, the query needs to try find a single matching rule - this match is done on the data_import.payee and import_rules.seach fields. Because these are both varchar fields, I have indexed them in the hope of making the query faster.

This is what I have come up with so far, which seems to work fine. Albeit slower than I hoped.

SELECT i.id, i.payee, i.amount, i.posted r.id, r.search
FROM import_data id
LEFT JOIN import_rules ir on REPLACE(i.payee, ' ', '') = REPLACE(ir.search, ' ', '')

One thing that the above query does not cater for, is that if import_data.posted = 1, then I dont need to find a rule for that line - is it possible to stop the query joining on that particular row? Similarly, if the payee is null, then it shouldn't try join either.

Are there any other ways that I can optimise this? I realise that doing text joins is not ideal...not sure if there are any better methods.

Upvotes: 1

Views: 111

Answers (2)

Riedsio
Riedsio

Reputation: 9926

I highly recommend doing anything you can to get rid of the REPLACEs in that JOIN. Using REPLACE on both sides of the join totally eliminate the ability to use an index on either table.

Assuming you can get rid of the REPLACEs (by cleansing the existing data and/or new data):

  • If you need to join on text columns, use a single byte per character charset if you application allows for it (for a smaller/faster index).
  • Make the N in VARCHAR(N) as small as you can as it will affect the side of the index (or arguably, use index prefixes).
  • I imagine you want to make the search index on import_rules UNIQUE -- then you're sure to only going to get 1 row result returned per row of import_data

You can throw an AND into your WHERE clause if you'd like to enforce your 'don't join in this case' rule.

LEFT JOIN import_rules ir ON id.payee=ir.search AND id.posted != 1

Upvotes: 3

Orbling
Orbling

Reputation: 20612

The use of REPLACE() on the join is probably breaking the indexing, as it has an index of the values in the field, not the amended values after REPLACE().

As for not joining, you are already using a LEFT JOIN, so non-matching joins will result in NULLs for the import_rules fields; you should be able to add WHERE clauses to force that.

Upvotes: 2

Related Questions