Reputation: 427
Trying to identify(and Remove) street suffixes (like "St", "Dr", etc...) from addresses. Assume that the suffixes are uniform and that we can create a comprehensive list of them.
Thanks!
street_suffix_list = ["St", "Dr", "Ave", "Blvd", "Tr"]
address = "105 Main St"
#returns "Main St"
street = address.gsub(/^((\d[a-zA-Z])|[^a-zA-Z])*/, '')
#desired: "St"
street_suffix =
#desired: "Main"
street_name =
Upvotes: 1
Views: 665
Reputation: 16507
You just need to separate street from suffix with Regexp
:
street_suffix_list = ["St", "Dr", "Ave", "Blvd", "Tr"]
address = "105 Main St"
idx = /(#{street_suffix_list.join('|')})\z/ =~ address
# $1 => St
sfx = $1
street = address[0..idx-1].strip
# street => "105 Main"
It is better to use safe join method for suffix array with Regexp
::union
method (thanx @Jordan):
idx = /\b(#{Regexp.union(street_suffix_list)})\z/ =~ address
Upvotes: 2
Reputation: 4970
If you know that the position of the suffix will be the last word in the string, then you don't need regexes to do it:
2.3.0 :017 > suffixes = %w(st ave dr rd blvd)
=> ["st", "ave", "dr", "rd", "blvd"]
2.3.0 :018 > address = '105 Main St'
=> "105 Main St"
2.3.0 :019 > tokens = address.split
=> ["105", "Main", "St"]
2.3.0 :021 > found_match = suffixes.include?(tokens.last.downcase)
=> true
2.3.0 :028 > if found_match
2.3.0 :029?> street_suffix = tokens.last
2.3.0 :030?> street_rest = tokens[0..-2]
2.3.0 :031?> # ...
2.3.0 :032 > puts street_suffix; puts street_rest.join(' ')
2.3.0 :033?> end
St
105 Main
=> nil
That all said, you will have a really hard time accounting for all the variations that addresses can contain. I strongly suggest using a gem for this, possibly the StreetAddress
gem mentioned by @oystersauce8.
Upvotes: 0
Reputation: 626794
You can build a dynamic regex pattern with alternations (also matching optional dots at the end to remove that punctuation, too, if present):
/\b(?:St|Dr|Ave|Blvd|Tr)\b\.*/
See this regex demo
Here is sample Ruby code:
street_suffix_list = ["St", "Dr", "Ave", "Blvd", "Tr"]
address = "105 Main St"
puts address.gsub(/\b(?:#{street_suffix_list.join("|")})\b\.*/, "").strip
# => 105 Main
NOTE that without word boundaries, you will remove Tr
in Transylvania
and similar.
Upvotes: 2
Reputation: 551
Using the 'streetaddress' gem, you can parse any address and extract components of the address.
gem install StreetAddress
irb
1.9.3-p551 :002 > require 'street_address'
=> true
1.9.3-p551 :003 > address = StreetAddress::US.parse("1600 Pennsylvania Ave, Washington, DC, 20500")
=> 1600 Pennsylvania Ave, Washington, DC 20500
1.9.3-p551 :004 > address.street
=> "Pennsylvania"
1.9.3-p551 :005 >
Upvotes: 3