user5326167
user5326167

Reputation:

Extract relevant address from string?

I am developing an address matching application using Google geocoding API. The problem is that some of the addresses in the database I am trying to validate are something like:

ATTN: Mr. THOMAS WONG 2457 Yonge St., Toronto, ON, N2S 2V5, Canada

rather than

2457 Yonge St., Toronto, ON, N2S 2V5, Canada

The first string returns null results (because it starts with a person's name), the second one will validate and return a full correct address.

My question is: What would be the right approach to this issue? I am thinking of a way to extract only the relevant part from the address string (with some function) but maybe there are better ideas?

Thank you, M.R.

Upvotes: 1

Views: 904

Answers (2)

DwB
DwB

Reputation: 38300

If the desired part of the address always starts with a number, try this:

  1. find the first digit in the string.
  2. get a substring from the first digit to the end of the string.
  3. you now have the address.

In order to parse addresses, you need to know all possible formats.

Do you need to include:

  • Santa, North Pole.
  • The Queen, Great Britian
  • Captian Hootberry
  • Bob Goldenberry, rural route 7, MN
  • Jackie Blam, P.O. Box 78, Hootville, OH

For a comprehensive address parsing solution, you will need to provide several algorithms for different address formats then determine which algorithm to use based on the input.

Upvotes: 1

Matt
Matt

Reputation: 23729

I work at SmartyStreets and wrote the address extractor which we now offer with LiveAddress API. It's hard. There are a lot of assumptions you need to force yourself not to make, including "if the address starts with a number." (Sorry DwB -- there's a lot to consider.)

If you have US addresses, you may still find our tool useful (it's free to sign up and use, to a point). Here's another Stack Overflow post about the extraction utility: https://stackoverflow.com/a/16448034/1048862

The best way to do this would be to use an address validation service -- one that can validate delivery points and not just address ranges (which is most common, so be wary of claims to "address validation" when it's really just guessing within certain bounds).

Be aware, too, that Google does not validate addresses. It may standardize them, and will return results where the address would exist if it were real, and if it is actually valid, it's your lucky day.

Upvotes: 1

Related Questions