user3794622
user3794622

Reputation: 23

match SSN anywhere in line without leading or trailing dash

I can match an SSN using:

\b\d{3}-\d{2}-\d{4}\b

It easily matches:

123-45-1234

or

John Doe SSN# 123-12-1235

The problem is it will also match:

100-123-45-1234-99

or

1010-23-3--123-23-1234-56-712

The dash is not recognized as part of a word boundary. I can't use ^ as the SSN sometimes is in a sentence or has leading whitespace. - the SSN doesn't always start at the beginning of a line.

I am at a loss.

I have tried using \A but it does not appear to work

\A\d{3}-\d{2}-\d{4}

matches only

123-45-1234

does not match:

John Smith, SSN, 123-45-1234

I basically need to catch exactly the string of digits and dashes for an SSN anywhere in a line except when it has a leading or trailing dash.

I have been testing this in rubular.com and cannot find a solution. All of the solutions I can find relate to using the ^ to identify the start of the line or \A but this breaks what I need it to do.

Upvotes: 2

Views: 828

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174786

The below regex would looks for the numbers in this format xxx-xx-xxxx are present just after a space or starting point and followed by a space or end of a line. ,

(?<=\s|^)\b\d{3}-\d{2}-\d{4}\b(?=\s|$)

DEMO

Explanation:

  • (?<=\s|^) Look-behind is used to look just after to a space or starting point.
  • \b Matches between a word and non word character.
  • \d{3}-\d{2}-\d{4} NUmber format. It must be xxx-xx-xxxx
  • \b Matches between a word and non word character.
  • (?=\s|$) Look-ahead is used to check whether the one which follows the number would be a space or line end.

Upvotes: 4

Related Questions