Reputation: 55564
I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match:
http://api.stackoverflow.com/1.1/questions/1234/answers http://api.physics.stackexchange.com/1.0/questions/5678/answers
Where
Upvotes: 1
Views: 4167
Reputation: 34395
This tested Java program has a commented regex which should do the trick:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
String s = "http://api.stackoverflow.com/1.1/questions/1234/answers";
Pattern p = Pattern.compile(
"http://api\\. # Scheme and api subdomain.\n" +
"(?: # Group for domain alternatives.\n" +
" stackoverflow # Either one\n" +
"| physics\\.stackexchange # or the other\n" +
") # End group for domain alternatives.\n" +
"\\.com # TLD\n" +
"/1\\.[01] # Either 1.0 or 1.1\n" +
"/questions/\\d+/answers # Rest of path.",
Pattern.COMMENTS);
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.print("Match found.\n");
} else {
System.out.print("No match found.\n");
}
}
}
Upvotes: 0
Reputation: 23465
^http://api[.][a-z]+([.]stackexchange)?[.]com/1[.][01]/questions/[0-9]+/answers$
^
matches start-of-string, $
matches end-of-line, [.]
is an alternative way to escape the dot than a backslash (which itself would need to be escaped as \\.
).
Upvotes: 1
Reputation: 120516
Pattern.compile("^(?i:http://api\\.(?:[a-z]+(?:\\.stackexchange)?)\\.com)/1\\.[01]/questions/[0-9]+/answers\\z")
The ^
makes sure it starts at the start of input, and the \\z
makes sure it ends at the end of input. All the dots are escaped so they are literal. The (?i:...)
part makes the domain and scheme case-insensitive as per the URL spec. The [01]
only matches the characters 0 or 1. The [0-9]+
matches 1 or more Arabic digits. The rest is self explanatory.
Upvotes: 5