Dimitry
Dimitry

Reputation: 99

I need some sort of full text search on mysql database

I've stuck with one quite tricky problem.

I have list of products from different warehouses, where each product have: Brand and Model plus some extra details. Model could be quite different from different warehouses for the same product, but Brand is always the same.

All list of products I store in one table, let's say it will be Product table. Then I have another table - Model, with CORRECT Model Name, Brand and additional details like image, description etc. Plus I have keywords column where I try to add all keywords manually.

And here is the problem, I need to associate each product that I receive from warehouse with one record from my Model table. Right now I'm using full text search in boolean mode, but that's quite painful and does not work very well. I need to do a lot of manual work.

Here are just few examples of names that I have:

The correct name for all of these items would be: WINTER SPORT 3D, so they should all be assigned to the same model.

So, is there any way to improve full text search or some other technique to solve my problem?

Database that I'm using is MySQL, I would prefer not to change it.

Upvotes: 1

Views: 392

Answers (2)

symcbean
symcbean

Reputation: 48377

I'll start by putting together a more formal definition of the tables:

warehouse:
    warehouse_id,
    warehouse_product_id,
    product_brand,
    product_name,
    local_id

Here I'd using local_id as a foreign key to your 'Model' table - but to avoid further confusion, I'll call it 'local'

local:
    id,
    product_brand,
    product_name

It seems like the table you describe as 'product' is redundant.

Obviously until the data is cross referenced, local_id will be null. But after it is populated it won't have to change, and given a warehouse_id, a band and a product, you can find your local descriptor easily:

SELECT local.*
FROM local, warehouse
WHERE local.id=warehouse.local_id
AND warehouse.product_brand=local.product_brand
AND warehouse_id=_____
AND warehouse.product_brand=____
AND warehouse.product_name=____

So all you need to do is populate the links. Soundex is a rather crude tool - a better solution for this would be the Levenstein distance algorithm. There's a mysql implementation here

Given a set of rows in the warehouse table which need to be populated:

SELECT w.*
FROM warehouse w
WHERE w.local_id IS NULL;

...for each row identify the best match as (using the values from the previous query as w.*)....

SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1

But this will find the best match, even if the 2 strings are completely different! Hence....

SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
AND levenstein(local.product_name, w.product_name)<
    (IF LENGTH(local.product_name)<LENGTH(w.product_name),
          LENGTH(local.product_name), LENGTH(w.product_name))/2
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1

...requires at least half the string to match.

So this can be implemented in a single update statement:

UPDATE warehouse w
SET local_id=(
   SELECT local.id
   FROM local
   WHERE local.product_brand=w.product_brand
   AND levenstein(local.product_name, w.product_name)<
    (IF LENGTH(local.product_name)<LENGTH(w.product_name),
          LENGTH(local.product_name), LENGTH(w.product_name))/2
   ORDER BY levenstein(local.product_name, w.product_name) ASC
   LIMIT 0,1
 )
 WHERE local_id IS NULL;

Upvotes: 1

matt eisenberg
matt eisenberg

Reputation: 183

Try Soundex. All of your examples resolve to W532 while the last one resolves to W536. So, you could:

  1. Add a column to PRODUCT and MODEL called SoundexValue and calculate the Soundex value for each product and model
  2. Compare the Soundex values in the PRODUCT table to the ones in the Model Table. You may have to use a range (+/- 5) to get a higher rate of matching.
  3. Follow the 80/20 rule. That is, spend 80% of your manual effort on the 20% that don't easily fall out.

Upvotes: 1

Related Questions