Matt
Matt

Reputation: 3180

Regex with conditional replacement

I need to write a regex to validate phone numbers with the following criteria:

Return the input as-is if it's fewer than 7 digits. Otherwise, remove the first character if it is a 1 or 0. If we haven't returned yet and the number is < 10 digits, return it. If it's >= 10 digits, return the last 7.

This is performance-critical code converted from coded conditional statements so ideally it can be done in a single regex. I managed to hack together something that got me close but I'm having some trouble meeting all criteria without further breaking things.

(Spaces are just to break things up since there's a lot here).

var pattern = Pattern.compile("(?<=\A[01]?) ([0-9]{1,9}) (?![0-9]) | (?:[01]?) (?<=\A[01]?) (?:[0-9]{3,}) ([0-9]{7}) (.*)", "$1$2");
return pattern.replaceAll(phoneNum);

This passes all the test strings I gave it except it doesn't remove the 0 or 1 like it should if they exist as the first character of strings of length 7+.

// Returns input as-is if fewer than 7 digits
    555123  --> 555123    Success

// If 7+ digits remove the first character if it is a 1 or 0
   1234567  --> 234567    Failure, returned 1234567

// If we haven't returned yet and the number is < 10 digits, return
   5551212  --> 5551212   Success

// If it's >= 10 digits, return the last 7
5551234567  --> 1234567   Success

Upvotes: 3

Views: 163

Answers (3)

Pitto
Pitto

Reputation: 8579

Here's the if version, as suggested in comments, I've also added your tests as unit tests:

import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

public class SomeClass {

    public String correctPhoneNumber(String number) {
        if (number.length() >= 7 && (number.startsWith("0") || number.startsWith("1"))) {
            return number.substring(1);
        }
        if (number.length() >= 10) {
            return number.substring(number.length() - 7);
        }
        return number;
    }

    @Test
    void correctPhoneNumberTest() {
        SomeClass objectToTest = new SomeClass();

        assertEquals("555123", objectToTest.correctPhoneNumber("555123"));
        assertEquals("234567", objectToTest.correctPhoneNumber("1234567"));
        assertEquals("5551212", objectToTest.correctPhoneNumber("5551212"));
        assertEquals("1234567", objectToTest.correctPhoneNumber("5551234567"));
    }
    
}

Upvotes: 1

JvdV
JvdV

Reputation: 75840

Java isn't my forte, but as people have mentioned regex might not be the right solution to your question. Just in case you are still interested in a regular expression, I think the following covers all your criteria:

^(?:(?=\d{7,9}$)[01]?|\d*(?=\d{7}$)|)(\d+$)

See the online demo


  • ^ - Start string ancor.
  • (?: - Open non-capturing group.
    • (?=\d{7,9}$- A positive lookahead to assert position when there are 7-9 digits up to end string ancor.
    • [01]? - Optionally capture a zero or one.
    • | - Or:
    • \d* - Capture as many digits but untill:
    • (?=\d{7}$) - Positive lookahead for 7 digits untill end string ancor.
    • | - Or: Match nothing.
    • ) - Close non-capturing group.
  • (\d+$) - Capture all remaining digits in 1st capture group until end string ancor.

Upvotes: 3

Joop Eggen
Joop Eggen

Reputation: 109547

A replaceAll with a lambda might be sufficient, having the disadvantage that the lambda is a bit slower, though the regex faster. It is more maintainable, certainly for real-world business logic. Just time the result in a micro-benchmark.

var pattern = Pattern.compile("\\b(\\d+)\\b");
return pattern.matcher(phoneNum).replaceAll(mr -> {
    String digits = mr.group(1);
    if (digits.length() < 7) { // Or better \\d{7, 20}
        return digits;
    }
    if (digits.startsWith("0") || digits.startsWith(1)) { // Can be optimized
        digits = digits.substring(1);
    }
    if (digits >= 10) {
        digits = digits.substring(digits.length() - 7);
    }
    return digits;
});

Your test cases should be kept as unit tests, as such business rules tend to change "slightly" - especially if you prefer a single regex.

Upvotes: 1

Related Questions