Reputation: 275
Start by disclaiming that I am horrible with Regular expressions. I want to find every instance of a Social security number in a string and mask all but the dashes (-) and the last 4 of the SSN.
Example
String someStrWithSSN = "This is an SSN,123-31-4321, and here is another 987-65-8765";
Pattern formattedPattern = Pattern.compile("^\\d{9}|^\\d{3}-\\d{2}-\\d{4}$");
Matcher formattedMatcher = formattedPattern.matcher(someStrWithSSN);
while (formattedMatcher.find()) {
// Here is my first issue. not finding the pattern
}
// my next issue is that I need to my String should look like this
// "This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765"
Expected results are to find each SSN and replace. The code above should produce the string, ""This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765"
Upvotes: 2
Views: 1437
Reputation: 2641
You can simplify this, by doing something like the following:
String initial = "This is an SSN,123-31-4321, and here is another 987-65-8765";
String processed = initial.replaceAll("\\d{3}\\-\\d{2}(?=\\-\\d{4})","XXX-XX");
System.out.println(initial);
System.out.println(processed);
Output:
This is an SSN,123-31-4321, and here is another 987-65-8765
This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765
The regex \d{3}\-\d{2}(?=\-\d{4})
captures three digits followed by two digits, separated by a dash (and then followed by a dash and 4 digits, non-capturing). Using replaceAll
with this regex will then create the desired masking effect.
Edit:
If you also want 9 consecutive digits to be targeted by this replacement, you can do the following:
String initial = "This is an SSN,123-31-4321, and here is another 987658765";
String processed = initial.replaceAll("\\d{3}\\-\\d{2}(?=\\-\\d{4})","XXX-XX")
.replaceAll("\\d{5}(?=\\d{4})","XXXXX");
System.out.println(initial);
System.out.println(processed);
Output:
This is an SSN,123-31-4321, and here is another 987658765
This is an SSN,XXX-XX-4321, and here is another XXXXX8765
The regex \d{5}(?=\d{4})
captures five digits (followed by 4 digits, non-capturing). Using a second call of replaceAll
will target these sequences with the appropriate replacement.
Edit: Here's a more robust version of the previous regex, and a longer demonstration of how the new regex works:
String initial = "123-45-6789 is a SSN that starts at the beginning of the string,
and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These
have 10+ digits, so they don't match: 123-31-43214, and 98765876545.
This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match.
-123-31-4321 is preceded by a dash, so it doesn't match as well. :123-31-4321 is
preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've
tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (777777777)
(777-77-7777), and here's four invalid SSNs in parentheses: (7777777778) (777-77-77778)
(777-778-7777) (7778-77-7777). At the end of the string is a matching SSN:
998-76-4321";
String processed = initial.replaceAll("(?<=^|[^-\\d])\\d{3}\\-\\d{2}(?=\\-\\d{4}([^-\\d]|$))","XXX-XX")
.replaceAll("(?<=^|[^-\\d])\\d{5}(?=\\d{4}($|\\D))","XXXXX");
System.out.println(initial);
System.out.println(processed);
Output:
123-45-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These have 10+ digits, so they don't match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match. -123-31-4321 is preceded by a dash, so it doesn't match as well. :123-31-4321 is preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (777777777) (777-77-7777), and here's four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: 998-76-4321
XXX-XX-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, XXX-XX-4321, and here is another XXXXX8765. These have 10+ digits, so they don't match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match. -123-31-4321 is preceded by a dash, so it doesn't match as well. :XXX-XX-4321 is preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (XXXXX7777) (XXX-XX-7777), and here's four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: XXX-XX-4321
Upvotes: 2