Reputation: 22334
Using Ruby (newb) and Regex, I'm trying to parse the street number from the street address. I'm not having trouble with the easy ones, but I need some help on:
'6223 1/2 S FIGUEROA ST' ==> 'S FIGUEROA ST'
Thanks for the help!!
UPDATE(s):
'6223 1/2 2ND ST' ==> '2ND ST'
and from @pesto '221B Baker Street' ==> 'Baker Street'
Upvotes: 3
Views: 1781
Reputation: 5267
Ouch! Parsing an address by itself can be extremely nasty unless you're working with standardized addresses. The reason for this that the "primary number" which is often called the house number can be at various locations within the string, for example:
It's not a trivial undertacking. Depending upon the needs of your application, you're best bet to get accurate information is to utilize an address verification web service. There are a handful of providers that offer this capability.
In the interest of full disclosure, I'm the founder of SmartyStreets. We have an address verification web service API that will validate and standardize your address to make sure it's real and allow you to get the primary/house number portion. You're more than welcome to contact me personally with questions.
Upvotes: 1
Reputation: 118128
Can street names be numbers as well? E.g.
1234 45TH ST
or even
1234 45 ST
You could deal with the first case above, but the second is difficult.
I would split the address on spaces, skip any leading components that do not contain a letter and then join the remainder. I do not know Ruby, but here is a Perl example which also highlights the problem with my approach:
#!/usr/bin/perl
use strict;
use warnings;
my @addrs = (
'6223 1/2 S FIGUEROA ST',
'1234 45TH ST',
'1234 45 ST',
);
for my $addr ( @addrs ) {
my @parts = split / /, $addr;
while ( @parts ) {
my $part = shift @parts;
if ( $part =~ /[A-Z]/ ) {
print join(' ', $part, @parts), "\n";
last;
}
}
}
C:\Temp> skip
S FIGUEROA ST
45TH ST
ST
Upvotes: 1
Reputation: 23311
There's another stackoverflow set of answers: Parse usable Street Address, City, State, Zip from a string
I think the google/yahoo decoder approach is best, but depends on how often/many addresses you're talking about - otherwise the selected answer would probably be the best
Upvotes: 1
Reputation: 27596
Group matching:
.*\d\s(.*)
If you need to also take into account apartment numbers:
.*\d.*?\s(.*)
Which would take care of 123A Street Name
That should strip the numbers at the front (and the space) so long as there are no other numbers in the string. Just capture the first group (.*)
Upvotes: 2
Reputation: 23880
This will strip anything at the front of the string until it hits a letter:
street_name = address.gsub(/^[^a-zA-Z]*/, '')
If it's possible to have something like "221B Baker Street", then you have to use something more complex. This should work:
street_name = address.gsub(/^((\d[a-zA-Z])|[^a-zA-Z])*/, '')
Upvotes: 3
Reputation: 1873
For future reference a great tool to help with regex is http://www.rubular.com/
Upvotes: 0
Reputation: 14185
/[^\d]+$/
will also match the same thing, except without using a capture group.
Upvotes: 0