notthehoff
notthehoff

Reputation: 1242

How does one match character OR nothing using regular expression

I am trying to take a block of numbers that may, or may not, have dividers and return them in a standard format. Using SSN as an example:

ex1="An example 123-45-6789"
ex2="123.45.6789 some more things"
ex3="123456789 thank you Ruby may I have another"

should all go into a method that returns "123-45-6789" Basically, anything(INCLUDING nothing) except a number or letter should return a SSN in a XXX-XX-XXXX format. The part that is stumping is a way to regular expressions to identify that there can be nothing.

What I have so far in IDENTIFYING my ssn:

def format_ssns(string)
  string.scan(/\d{3}[^0-9a-zA-Z]{1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end

It seems to work for everything I expect EXCEPT when there is nothing. "123456789" does not work. Can I use regular expressions in this case to identify lack of anything?

Upvotes: 31

Views: 75017

Answers (4)

user3188140
user3188140

Reputation: 275

Have you tried to match 0 or 1 characters between your numbers?

\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{0,1}\d{4}

Upvotes: 11

Dan Tao
Dan Tao

Reputation: 128317

This has already been shared in a comment, but just to provide a complete-ish answer...

You have these tools at your disposal:

  • x matches x exactly once
  • x{a,b} matches x between a and b times
  • x{a,} matches x at least a times
  • x{,b} matches x up to (a maximum of) b times
  • x* matches x zero or more times (same as x{0,})
  • x+ matches x one or more times (same as x{1,})
  • x? matches x zero or one time (same as x{0,1})

So you want to use that last one, since it's exactly what you're looking for (zero or one time).

/\d{3}[^0-9a-zA-Z]?\d{2}[^0-9a-zA-Z]?\d{4}/

Upvotes: 66

nhahtdh
nhahtdh

Reputation: 56809

Your current regex will allow 123-45[6789, not to mention all kinds of Unicode characters and control characters. In the extreme case:

123
45師6789

is considered a matched by your regex.

You can use backreference to make sure the separator is the same.

/\d{3}([.-]?)\d{2}\1\d{4}/

[.-]? will match either ., - or nothing (due to the optional ? quantifier). Whatever matched here will be used to make sure that the second separator is the same via backreference.

Upvotes: 2

notthehoff
notthehoff

Reputation: 1242

Whelp... looks like I just found my own answer, but any clues for improvement would be helpful.

def format_ssns(string)
  string.scan(/\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end

Seems to do the trick.

Upvotes: 0

Related Questions