Reputation: 485
To check the customer address against to our SQL DB address list,to tell the address existed in our DB or not. The customer address comes in different format.
e.g:- 8455 W 23RD st OFFICE 4
8455 WEST 23RD st OFC 4
8455 WEST TWENTY-THIRD street OFC 4
If I do string compare,I will get bExist = False;
Note:- I am using C#, The server were the application run has no internet connection. Additional eg:-
APARTMENT APT
APARTMENT APARTMENT
BUILDING BLDG
BUILDING BUILDING
DEPARTMENT DEPT
DEPARTMENT DEPARTMENT
FLOOR FL
FLOOR FLOOR
HANGAR HNGR
HANGAR HANGAR
LOT LOT
OFFICE OFC
OFFICE OFFICE
PIER PIER
PIER PIER
ROOM RM
ROOM ROOM
SLIP SLIP
SPACE SPC
SPACE SPACE
STOP STOP
SUITE STE
SUITE SUITE
TRAILER TRLR
TRAILER TRAILER
UNIT UNIT
N NORTH
S SOUTH
E EAST
W WEST
NE NORTHEAST
SE SOUTHEAST
NW NORTHWEST
SW SOUTHWEST
NORTH NORTH
SOUTH SOUTH
EAST EAST
WEST WEST
NORTHEAST NORTHEAST
SOUTHEAST SOUTHEAST
NORTHWEST NORTHWEST
SOUTHWEST SOUTHWEST
NORTH EAST NORTHEAST
SOUTH EAST SOUTHEAST
NORTH WEST NORTHWEST
SOUTH WEST SOUTHWEST
Upvotes: 3
Views: 2936
Reputation: 74345
You can't even approach success in straight SQL. It's difficult even in a "normal" programming language. There are regional addressing styles to be considered, and so many address variants that people come up with, such as
All of which reflect the same physical address.
And that doesn't even take into account simple data entry errors such as transposed characters.
Things quickly get very difficult. You'll spend an inordinate amount of time and energy dealing with whatever the next creative data entry mistake somebody makes.
What you want to do is to use the US Post Office's address standardization services prior to recording the address in your database. There are also quite a few 3rd party vendors that the USPS has certified for accuracy.
I worked for a startup for a while that used the Google Maps API to standardize addresses — another approach.
Upvotes: 1
Reputation: 44306
I'd go with something like...
public static string Substitute(string s)
{
var abbrevs = new Dictionary<string, string>();
abbrevs.Add("OFC", "OFFICE");
abbrevs.Add("ST", "STREET");
abbrevs.Add("ST.", "STREET");
if (abbrevs.ContainsKey(s)) return abbrevs[s];
return SubstituteWordNumbersForNumerics(s);
}
public static string ToNormalAddressFormat(string address)
{
return address.Split(' ').ToList().Select(Substitute).Aggregate((x, y) => x + " " + y);
}
Then convert all addresses before comparing
Upvotes: 0
Reputation: 34784
There are a number of options, if you have SSIS you could use their built in fuzzy matching which is pretty adequate.
I've used a tool from Ambient Concepts: Address Parser It's a collection of functions that will parse and standardize your addresses. They appear to offer a free-trial so you can test it out.
You could also do it from scratch, homogenizing data as best you can. We often separate street number and street name, since the vast majority of differences come from the pre/post directions ie: 12 north 5th st, 12 n 5th st, 12 5th st north, etc. As well as the numbers vs number-words.
Upvotes: 1