Habesha
Habesha

Reputation: 485

Compare Address against to addresses list

To check the customer address against to our SQL DB address list,to tell the address existed in our DB or not. The customer address comes in different format.

e.g:- 8455 W 23RD st OFFICE 4
     8455 WEST 23RD st OFC 4
     8455 WEST TWENTY-THIRD street OFC 4

If I do string compare,I will get bExist = False;

Note:- I am using C#, The server were the application run has no internet connection. Additional eg:-

APARTMENT   APT
APARTMENT   APARTMENT
BUILDING    BLDG
BUILDING    BUILDING
DEPARTMENT  DEPT
DEPARTMENT  DEPARTMENT
FLOOR   FL
FLOOR   FLOOR
HANGAR  HNGR
HANGAR  HANGAR
LOT LOT
OFFICE  OFC
OFFICE  OFFICE
PIER    PIER
PIER    PIER
ROOM    RM
ROOM    ROOM
SLIP    SLIP
SPACE   SPC
SPACE   SPACE
STOP    STOP
SUITE   STE
SUITE   SUITE
TRAILER TRLR
TRAILER TRAILER
UNIT    UNIT

N   NORTH
S   SOUTH
E   EAST
W   WEST
NE  NORTHEAST
SE  SOUTHEAST
NW  NORTHWEST
SW  SOUTHWEST
NORTH   NORTH
SOUTH   SOUTH
EAST    EAST
WEST    WEST
NORTHEAST   NORTHEAST
SOUTHEAST   SOUTHEAST
NORTHWEST   NORTHWEST
SOUTHWEST   SOUTHWEST
NORTH EAST  NORTHEAST
SOUTH EAST  SOUTHEAST
NORTH WEST  NORTHWEST
SOUTH WEST  SOUTHWEST

Upvotes: 3

Views: 2936

Answers (3)

Nicholas Carey
Nicholas Carey

Reputation: 74345

You can't even approach success in straight SQL. It's difficult even in a "normal" programming language. There are regional addressing styles to be considered, and so many address variants that people come up with, such as

  • 123 Main St, Apt 3A, ...
  • 123 Main St #3, ...
  • 123 Main St Box 3 ...

All of which reflect the same physical address.

And that doesn't even take into account simple data entry errors such as transposed characters.

Things quickly get very difficult. You'll spend an inordinate amount of time and energy dealing with whatever the next creative data entry mistake somebody makes.

What you want to do is to use the US Post Office's address standardization services prior to recording the address in your database. There are also quite a few 3rd party vendors that the USPS has certified for accuracy.

I worked for a startup for a while that used the Google Maps API to standardize addresses — another approach.

Upvotes: 1

Keith Nicholas
Keith Nicholas

Reputation: 44306

I'd go with something like...

  public static string Substitute(string s)
            {
                var abbrevs = new Dictionary<string, string>();
                abbrevs.Add("OFC", "OFFICE");
                abbrevs.Add("ST", "STREET");
                abbrevs.Add("ST.", "STREET");
                if (abbrevs.ContainsKey(s)) return abbrevs[s];
                return SubstituteWordNumbersForNumerics(s);            
            }

            public static string ToNormalAddressFormat(string address)
            {
                return address.Split(' ').ToList().Select(Substitute).Aggregate((x, y) => x + " " + y);
            }

Then convert all addresses before comparing

Upvotes: 0

Hart CO
Hart CO

Reputation: 34784

There are a number of options, if you have SSIS you could use their built in fuzzy matching which is pretty adequate.

I've used a tool from Ambient Concepts: Address Parser It's a collection of functions that will parse and standardize your addresses. They appear to offer a free-trial so you can test it out.

You could also do it from scratch, homogenizing data as best you can. We often separate street number and street name, since the vast majority of differences come from the pre/post directions ie: 12 north 5th st, 12 n 5th st, 12 5th st north, etc. As well as the numbers vs number-words.

Upvotes: 1

Related Questions