Reputation: 23
I can match an SSN using:
\b\d{3}-\d{2}-\d{4}\b
It easily matches:
123-45-1234
or
John Doe SSN# 123-12-1235
The problem is it will also match:
100-123-45-1234-99
or
1010-23-3--123-23-1234-56-712
The dash is not recognized as part of a word boundary. I can't use ^ as the SSN sometimes is in a sentence or has leading whitespace. - the SSN doesn't always start at the beginning of a line.
I am at a loss.
I have tried using \A but it does not appear to work
\A\d{3}-\d{2}-\d{4}
matches only
123-45-1234
does not match:
John Smith, SSN, 123-45-1234
I basically need to catch exactly the string of digits and dashes for an SSN anywhere in a line except when it has a leading or trailing dash.
I have been testing this in rubular.com and cannot find a solution. All of the solutions I can find relate to using the ^ to identify the start of the line or \A but this breaks what I need it to do.
Upvotes: 2
Views: 828
Reputation: 174786
The below regex would looks for the numbers in this format xxx-xx-xxxx
are present just after a space or starting point and followed by a space or end of a line. ,
(?<=\s|^)\b\d{3}-\d{2}-\d{4}\b(?=\s|$)
Explanation:
(?<=\s|^)
Look-behind is used to look just after to a space or starting point.\b
Matches between a word and non word character.\d{3}-\d{2}-\d{4}
NUmber format. It must be xxx-xx-xxxx\b
Matches between a word and non word character.(?=\s|$)
Look-ahead is used to check whether the one which follows the number would be a space or line end.Upvotes: 4