Reputation: 50422
I'm using javascript to parse through some data and have run into a bit of a pickle.
I have a field that is 1-3 lines of data.
Usually it is only one line, representing a street address:
1234 Hollywood St.
But sometimes it is something like this:
Beverly Hills Shopping Center 1234 Hollywood St.
Other times it is this:
1234 Hollywood St Ste 12
And other times its stuff like this:
1234 Hollywood St 2nd Floor (between Hollywood St and Tom Cruise Ave)
I'd really like to know which line is the street address. Currently, I'm trying to identify which line is the "Address line 2", meaning the Suite#, Floor number, etc... I don't really need the address, line 2, but by process of elimination, this helps get me the street address.
Is there a nice tool available, like a regex function or something that will tell me if a string is likely a street address?
Or is there another way that I could be handling this?
Thanks!
Edit:
This algorithm does not need to be 100%. I'm preparing the address to be sent to google maps API to be verified. I could try each line of the address to see which one is valid but this would increase the number of calls to google and carry a small, but finite chance of a false positive.
I'd like to be able to scrub the data a little before verifying through google to decrease errors and the necessity for more calls.
Upvotes: 2
Views: 2333
Reputation: 5985
As stated in another answer, this is a job for an address verification service. Please note that the Google maps API is not an address verification service--it would be best described as a very capable address approximation service (there's a notable difference).
Address verification implies that an address is real at the present time, meaning that it corresponds to an actual location. It often implies that an address is deliverable (depending on the business need).
I'm a software developer at SmartyStreets, an address verification company. We provide a batch processing tool that I think is a good fit for your use case. Since our system accepts up to two input lines for the streets address, I suggest generating a few permutations for each address that has more than 2 street address lines. It is also very fast (1 million addresses are processed in less than an hour) and doesn't require any interaction from us because it's an online service.
The other bit of good news is that you may not even need to send the address to the google maps API because they will already be Delivery-Point Validated. But that will depend on your exact needs.
Update: SmartyStreets now provides international address verification.
Upvotes: 2
Reputation:
First of all have a look at the following official USPS abbreviations
Street Suffix Abbreviations
Secondary Unit Designators
Then you will have an idea of what you will expect as input, but you also have to take in place all possible unofficial variations/punctuation etc.... A lot of things to do...
In general a street address line should start with a number followed by a space (separates it from 2nd floor etc), one or more words, and finally a street suffix abbreviation. For the city, state, zip tuple again you have to mix full state names and their abbeviations (including short variations like N York or N.York or N. York) and remember the zip5 and zip5+4 cases.
Upvotes: 1
Reputation: 11351
There are webservices available that you can pass an address and it will return a well formed json/xml object of the parsed address. Perhaps something like that will help you? Like some of the comments state. You won't be able to do this simply with javascript
Here is one service I have personally looked into using. You'll need to get familiar with the APIs
Upvotes: 1