Steven
Steven

Reputation: 19435

What is the best way to spellcheck street address?

When importing new addresses to my DB, I do a spellchek to see if the street already exists (the new street is only spelled wrong).

We are currently usingthe Levenshtein method in MySQL query to find similar street names. The problem is street numbers. Having street nmbers in the address really slows down the similarity search / spellcheking.

Example:

Street abc 34
Street abc 37
Street abc 39

These street names is spelled correctly, but the Levenshtein method thinks they are misspelled because of the street numbers.

We have develope a PHP function that takes anything after (and including) the first digit and puts it in another column.

This works fine for streets having the street number at the end, but will not work for countries having the street numbers at the start.

I'm wondering if anyaone else have worked on similar problems?

Update
The solution is for a store locator web site and I'm currently working on the module that will import store lists.

One solution is using Google Maps API and see if it returns geo address.

Upvotes: 1

Views: 632

Answers (3)

Jonathan Oliver
Jonathan Oliver

Reputation: 5267

This is a very common problem. For example, you can have multiple addresses that all represent the same physical location but are structured differently. For example:

100 North 250 West 100 North 250W 100 North 250 W 100N 250 West 100 N 250 West 100 North 250 West

According to the US Postal Service, the standardized address is 100 N 250 W. Only by resolving each of these addresses to a standardized format would you be able to accurately remove duplicates and be able to ensure consistent results.

Addresses are extremely difficult to standardize without some additional context. The context that I am referring to is an up-to-date master list of all the valid/deliverable address in the country. This is not actually available in a list format (it would be huge) but is available to access as an API. The US Postal Service makes their API available and there are other companies that take the USPS data and enhance it through their own API. The enhancements are typically faster service and guaranteed uptime as well as additional address processing functions and more data returned about the address.

So, in quick answer, the best way to do spellcheck on a street address would be to use an API to validate the full address.

In the interest of full disclosure, I'm the founder of SmartyStreets and we do address verification. If you are a nonprofit organization, you can use our services at no charge. There are several address verification companies out there--just do a Google search for "address verification" and you'll find a bunch.

Upvotes: 0

user1074324
user1074324

Reputation:

You can use the APi for Fedex, UPS, USPS, and validate an address. this is done for lots of eCommerce sites for shipping addresses... that's why sometimes you might see

"Did you mean this address"...

You can also do this with Google Maps's api.

Upvotes: 1

alex
alex

Reputation: 5201

Uh-oh, generic address is an extremely hard problem. My suggestion is that you perform the minimal amount of validation you can tolerate.

If this is for shipping purposes, for instance, just use dropdowns for the stuff that's going to determine shipping costs, for example. If you have different shipping costs for different countries, just provide a free-form text area with no validation and a countries dropdown. If the user can't spell their address, tough luck. You can have whomever that handles shipping verify the address "humanly". Delivery companies and post companies mostly can deliver parcels to misspelled addresses (Randomcountry's post company probably knows their street names better than you, anyway).

If you really need precise addresses, try to find a third-party solution for this. Using Google Maps API might work, and there exist paid solutions for this.

Considering your algorithm, though, the following solution springs to mind; just use a regex to strip numbers (or even non-letters). However, keep in mind that there are correct street names which are numbers (i.e. NY's 9th Avenue).

Upvotes: 3

Related Questions