I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match: http://api. stackoverflow .com/1. 1 /questions/ 1234 /answers http://api. physics.stackexchange .com/1. 0 /questions/ 5678 /answers Where everything not in bold must identical. The first bold part, can only be made of a to z, and either one or no full stop. Also it would be good, if there is one full stop the word "stackexchange" must follow. However this isn't crucial. The second bold part can only be a 1 or a 0. The last bold part can be only numbers 0 to 9, and can be any length There can't be anything at all before or after the url, not even a trailing slash

Reputation: 55564

Regex for specific url format

I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match:

http://api.stackoverflow.com/1.1/questions/1234/answers  
http://api.physics.stackexchange.com/1.0/questions/5678/answers

Where

everything not in bold must identical.
The first bold part, can only be made of a to z, and either one or no full stop.
- Also it would be good, if there is one full stop the word "stackexchange" must follow. However this isn't crucial.
The second bold part can only be a 1 or a 0.
The last bold part can be only numbers 0 to 9, and can be any length
There can't be anything at all before or after the url, not even a trailing slash

Upvotes: 1

Answers (3)

ridgerunner

Reputation: 34395

This tested Java program has a commented regex which should do the trick:

import java.util.regex.*;
public class TEST {
    public static void main(String[] args) {
        String s = "http://api.stackoverflow.com/1.1/questions/1234/answers";

        Pattern p = Pattern.compile(
            "http://api\\.              # Scheme and api subdomain.\n" +
            "(?:                        # Group for domain alternatives.\n" +
            "  stackoverflow            # Either one\n" +
            "| physics\\.stackexchange  # or the other\n" +
            ")                          # End group for domain alternatives.\n" +
            "\\.com                     # TLD\n" +
            "/1\\.[01]                  # Either 1.0 or 1.1\n" +
            "/questions/\\d+/answers    # Rest of path.", 
            Pattern.COMMENTS);
        Matcher m = p.matcher(s);
        if (m.matches()) {
            System.out.print("Match found.\n");
        } else {
            System.out.print("No match found.\n");
        }
    }
}

Upvotes: 0

trutheality

Reputation: 23465

^http://api[.][a-z]+([.]stackexchange)?[.]com/1[.][01]/questions/[0-9]+/answers$

^ matches start-of-string, $ matches end-of-line, [.] is an alternative way to escape the dot than a backslash (which itself would need to be escaped as \\.).

Upvotes: 1

Mike Samuel

Reputation: 120516

Pattern.compile("^(?i:http://api\\.(?:[a-z]+(?:\\.stackexchange)?)\\.com)/1\\.[01]/questions/[0-9]+/answers\\z")

The ^ makes sure it starts at the start of input, and the \\z makes sure it ends at the end of input. All the dots are escaped so they are literal. The (?i:...) part makes the domain and scheme case-insensitive as per the URL spec. The [01] only matches the characters 0 or 1. The [0-9]+ matches 1 or more Arabic digits. The rest is self explanatory.

Upvotes: 5

Regex for specific url format

Answers (3)

Related Questions