Reputation: 12749
Trying to make a regex that can handle input like either:
I have this:
^(.+)[,\\s]+(.+)\s+(\d{5})?$
It works for the #2 case, but not #1. If I change the \s+
to \s*
then it works for #1 but not #2.
You can play around with it here: http://rubular.com/r/oqKBJ4r8cq
Upvotes: 2
Views: 7222
Reputation: 351526
Try this instead:
^([^,]+),\s([A-Z]{2})(?:\s(\d{5}))?$
This expression works on both examples, captures each piece of the address in separate groups, and properly handles whitespace.
Here is how it breaks down:
^ # anchor to the start of the string
([^,]+) # match everything except a comma one or more times
, # match the comma itself
\s # match a single whitespace character
([A-Z]{2}) # now match a two letter state code
(?: # create a non-capture group
\s # match a single whitespace character
(\d{5}) # match a 5 digit number
)? # this whole group is optional
$ # anchor to the end of the string
Upvotes: 6
Reputation: 14121
((?:\w|\s)+),\s(AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)
Here is a long one which only grabs valid state codes.
Upvotes: 0
Reputation: 14453
["Beverly Hills, CA 90210", "Beverly Hills, CA"].each do |s|
m = s.match(/^([^,]*),\s*(\w*)\s*(\d*)?$/)
$1 # => "Beverly Hills", "Beverly Hills"
$2 # => "CA", "CA"
$3 # => "90210", ""
end
The # => comments show the results for both runs.
Upvotes: 0
Reputation: 222198
Try this:
^(.+)[,\\s]+(.+?)\s*(\d{5})?$
http://rubular.com/r/qS0e5vAQnT
Upvotes: 6