Reputation: 1242
I am trying to take a block of numbers that may, or may not, have dividers and return them in a standard format. Using SSN as an example:
ex1="An example 123-45-6789"
ex2="123.45.6789 some more things"
ex3="123456789 thank you Ruby may I have another"
should all go into a method that returns "123-45-6789" Basically, anything(INCLUDING nothing) except a number or letter should return a SSN in a XXX-XX-XXXX format. The part that is stumping is a way to regular expressions to identify that there can be nothing.
What I have so far in IDENTIFYING my ssn:
def format_ssns(string)
string.scan(/\d{3}[^0-9a-zA-Z]{1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end
It seems to work for everything I expect EXCEPT when there is nothing. "123456789" does not work. Can I use regular expressions in this case to identify lack of anything?
Upvotes: 31
Views: 75017
Reputation: 275
Have you tried to match 0 or 1 characters between your numbers?
\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{0,1}\d{4}
Upvotes: 11
Reputation: 128317
This has already been shared in a comment, but just to provide a complete-ish answer...
You have these tools at your disposal:
x
matches x
exactly oncex{a,b}
matches x
between a
and b
timesx{a,}
matches x
at least a
timesx{,b}
matches x
up to (a maximum of) b
timesx*
matches x
zero or more times (same as x{0,}
)x+
matches x
one or more times (same as x{1,}
)x?
matches x
zero or one time (same as x{0,1}
)So you want to use that last one, since it's exactly what you're looking for (zero or one time).
/\d{3}[^0-9a-zA-Z]?\d{2}[^0-9a-zA-Z]?\d{4}/
Upvotes: 66
Reputation: 56809
Your current regex will allow 123-45[6789
, not to mention all kinds of Unicode characters and control characters. In the extreme case:
123
45師6789
is considered a matched by your regex.
You can use backreference to make sure the separator is the same.
/\d{3}([.-]?)\d{2}\1\d{4}/
[.-]?
will match either .
, -
or nothing (due to the optional ?
quantifier). Whatever matched here will be used to make sure that the second separator is the same via backreference.
Upvotes: 2
Reputation: 1242
Whelp... looks like I just found my own answer, but any clues for improvement would be helpful.
def format_ssns(string)
string.scan(/\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end
Seems to do the trick.
Upvotes: 0