Patricio
Patricio

Reputation: 137

Using Full Text Search in standarization

I'm standardizing street addresses and i think full text search, maybe, work fine with my problem. My problem is that I have 2 millon of errors and using "LIKE" dont work correctly. This is what i have now:

UPDATE addresses
SET standarAddress = REPLACE (addresses.streetAddress, errors.streetAddress, corrects.streetAddress)
FROM corrects 
INNER JOIN errors 
ON corrects.id = errors.idCorrects
WHERE (addresses.streetAddress LIKE errors.streetAddress + ' ' + '[0-9]%')

but instead of LIKE (last line) I want to use this:

WHERE CONTAINS(addresses.streetAddress, '@MyErrorVariable')

The next is an real example with my tables:

--- addresses I want to standarize
CREATE TABLE addresses 
    (id INT PRIMARY KEY NOT NULL,
     direccion VARCHAR(250))
GO
--- addresses with errors like Acrwod Driv instead of Acrewood Drive
INSERT INTO addresses VALUES (1,'Acrwod Driv 45249')
INSERT INTO addresses VALUES (2,'Advntur Lne 45242')
INSERT INTO addresses VALUES (3,'Aborcrek Lan 45242')
GO
--- corrects street addresses
CREATE TABLE corrects
   (id INT PRIMARY KEY NOT NULL,
    streetAddress VARCHAR(250) NULL)
GO
INSERT INTO corrects VALUES (1,'Acrewood Drive')
INSERT INTO corrects VALUES (2,'Adventure Lane')
INSERT INTO corrects VALUES (3,'Arborcreek Lane')

GO
--- errors form corrects street addresses
CREATE TABLE errors
   (id INT PRIMARY KEY NOT NULL,
    streetAddress VARCHAR(250) NULL,
    idCorrects INT FOREIGN KEY REFERENCES corrects(id) NOT NULL)
GO
INSERT INTO errors VALUES (1,'Acrwod Driv',1)
INSERT INTO errors VALUES (2,'Advntur Lne',2)
INSERT INTO errors VALUES (3,'Aborcrek Lan',3)

My problem with constains is that in this WHERE CONTAINS(addresses.streetAddress, '@MyErrorVariable'), in my @MyErrorVariable only accept value like this 'Acrwod Driv' and all my errors are in errors table. Please any idea or help. Thanks!

Upvotes: 0

Views: 44

Answers (1)

paparazzo
paparazzo

Reputation: 45106

Full text is not going to help as that is a phrase and it would take a while to create the full text index?

Why do you have two tables if you have a one to one relationship on correct to error

Fix 
Wrong  PK 
Right    

UPDATE addresses
   SET standarAddress = REPLACE (addresses.streetAddress, Fix.Wrong, Fix.Right)
  FROM addresses
 INNER JOIN Fix
    ON addresses.streetAddress LIKE Fix.Wrong + ' ' + '[0-9]%' 
   and standarAddress <> REPLACE (addresses.streetAddress, Fix.Wrong, Fix.Right)

You may want to break it as 2 million is a big transaction log

UPDATE addresses
   SET standarAddress = REPLACE (addresses.streetAddress, Fix.Wrong, Fix.Right)
  FROM addresses
 INNER JOIN Fix
    ON addresses.streetAddress LIKE Fix.Wrong + ' ' + '[0-9]%' 
   and standarAddress <> REPLACE (addresses.streetAddress, Fix.Wrong, Fix.Right)
   and Substring(addresses.streetAddress, 0 , 1) = 'a' 
   and Substring(Fix.Wrong, 0 , 1) = 'a' 

... b ... c Substring will use an index

Upvotes: 1

Related Questions