Matt Clark
Matt Clark

Reputation: 28629

Matching similar patterns with a single regex

I currently writing some tests that validate some data in a specific format and I am trying to do the following,

The field to test will contain data in one of the following values:

Uncached Response, IP and port of the responding server

xxx.xxx.xxx.xxx:xxxx

Partial cache hit, with IP and port of the responding server

xxx.xxx.xxx.xxx:xxxx:cached

Or a full cache hit

cached

I really don't care what the data is, just that it matches one of these formats.

I have an expression to match a host and port,

(([01]?\d\d?|2[0-4]\d|25[0-5]).){3}([01]?\d\d?|2[0-4]\d|25[0-5]):0*(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])

And I could easily add on a :cached to match for that, or even just looking for the cached, but that would require 3 seperate validations -

How could I match any of these formats using a single RexEx? Is there an optional flag? Match cached, the ip/port regex, or both?

Upvotes: 1

Views: 74

Answers (2)

Bohemian
Bohemian

Reputation: 425238

You can make all terms optional, but that leaves open the possibility of a blank matching. That can be prevented by adding a lookahead.

To make clear the regex, I'll placehold the ip part

^(?!$)(<IP-REGEX>)(((?<=^)|(?<!^):)cached)?$

Using a simpler regex for the ip (not range-checking, just checking the "x" in your example is any digit), the whole thing would be:

^(?!$)((\d{3}\.){3}\d{3}:\d{1,5})?(((?<=^)|(?<!^):)cached)?$

See live demo, matching:

111.222.333.444:5555
111.222.333.444:5555:cached
cached

and not matching:

111.222.333.444:5555cached
:
:cached

FYI, the regex (?!$) is a negative look ahead, anchored to start, that asserts the following input is not the end (ie, the input isn't empty).

Note that I added an alternation (with look arounds) for the colon immediately preceding "cached" to match either preceded by start of input or a colon not preceded by start of input, which deals with preventing the colon from being simply optional (which would allow a missing colon ie ip:portcached)

Upvotes: 2

Rodrigo V
Rodrigo V

Reputation: 101

Attention: The original regex for <ip>:<port> has a problem. The dot (.) must be escaped with \\ otherwise the regex will also accept the value “10A251B251C251:65535”

I tested both solutions, presented by @Joe DeRose and @Bohemian, and both looks to work great in Java. Follow below the code I used to test some scenarios.

private static final String IP_PORT_PATTERN = 
    "(([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.){3}" +
    "([01]?\\d\\d?|2[0-4]\\d|25[0-5]):" +
         "0*(?:6553[0-5]|" +    
         "655[0-2][0-9]|" +
         "65[0-4][0-9]{2}|" +
         "6[0-4][0-9]{3}|" +
         "[1-5][0-9]{4}|" +
         "[1-9][0-9]{1,3}|" +
         "[0-9])";

private static final String CACHED_PATTERN_1 = 
    "("+IP_PORT_PATTERN+"|"+IP_PORT_PATTERN+":cached|cached)";

private static final String CACHED_PATTERN_2 = 
    "^(?!$)("+IP_PORT_PATTERN+")?(((?<!^):)?cached)?$";


public static void main(String[] args) {

    //String regex = CACHED_PATTERN_1;
    String regex = CACHED_PATTERN_2;
    String str;

    System.out.println("Those must pass...");
    str = "100.100.100.100:100";    
    System.out.println(str+"? "+ str.matches(regex));
    str = "10.251.251.251:65535";    
    System.out.println(str+"? "+ str.matches(regex));
    str = "10.251.251.251:65535:cached";    
    System.out.println(str+"? "+ str.matches(regex));
    str = "cached";
    System.out.println(str+"? "+ str.matches(regex));

    System.out.println("\nThose must fail...");
    str = ":cached";    
    System.out.println(str+"? "+ str.matches(regex));
    str = "10A251B251C251:65535";    
    System.out.println(str+"? "+ str.matches(regex));
}

Upvotes: 0

Related Questions