Zack Herbert
Zack Herbert

Reputation: 960

Regular Expressions to parse addresses

I am trying to learn how to use regex to parse location/address strings. Unfortunately the data that I have been given is inconsistent and unconventional to how most addresses are written. Below is what I have so far, the problem that I am having is I need to parse the string multiple time to get it down to the proper format.

Take the following string for example: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649 the end result that I want is 110 Spruce, Greenwood, SC 29649

CODE:

l = nil
location_str = "102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649"
1.upto(4).each do |attempt|
  l = Location.from_string(location_str)
  puts "TRYING: #{location_str}"
  break if !l.nil?
  location_str.gsub!(/^[^,:\-]+\s*/, '')
end

OUTPUT:

TRYING: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649

EXPECTED:

TRYING: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: 110 Spruce, Greenwood, SC 29649

Upvotes: 1

Views: 128

Answers (3)

JWT
JWT

Reputation: 407

On the assumption that the format is:

"Stuff you aren't interested in, more stuff, more stuff, etc., house, city, state zip"

then you just take the last 3 sections by anchoring to the end of the string using a dollar sign:

location_str[/[^,]*,[^,]*,[^,]*$/]

Upvotes: 1

Jordan Running
Jordan Running

Reputation: 106027

This is one of those there's-more-than-one-way-to-do-it things. Here's yet another:

def address_from_location_string(location)
  *_, address, city, state_zip = location.split(/\s*,\s*/)
  "#{address}, #{city}, #{state_zip}"
end

address_from_location_string("102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649")
# => "110 Spruce, Greenwood, SC 29649"

Upvotes: 2

Anthony
Anthony

Reputation: 15967

An attempt without regex:

address = "102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649"
elements = address.split(",").map(&:strip)
city, state_and_zip = elements[elements.length-2..-1]
addresses = elements[0...elements.length-2]

p [addresses.last, city, state_and_zip].join(",")

output:

"110 Spruce,Greenwood,SC 29649"

Upvotes: 0

Related Questions