Reputation: 115
I'm trying to write a rule to match on a top level domain followed by five digits. My problem arises because my existing pcre is matching on what I have described but much later in the URL then when I want it to. I want it to match on the first occurence of a TLD, not anywhere else. The easy way to check for this is to match on the TLD when it has not bee preceeded at some point by the "/" character. I tried using negative-lookbehind but that doesn't work because that only looks back one single character.
e.g.: How it is currently working
domain.net/stuff/stuff=www.google.com/12345
matches .com/12345 even though I do not want this match because it is not the first TLD in the URL
e.g.: How I want it to work
domain.net/12345/stuff=www.google.com/12345
matches on .net/12345 and ignores the later match on .com/12345
My current expression
(\.[a-z]{2,4})/\d{5}
EDIT: rewrote it so perhaps the problem is clearer in case anyone in the future has this same issue.
Upvotes: 1
Views: 300
Reputation: 784918
You can use this regex:
'|^(\w+://)?([\w-]+\.)+\w+/\d{5}|'
Upvotes: 0
Reputation: 9618
You're pretty close :)
You just need to be sure that before matching what you're looking for (i.e: (\.[a-z]{2,4})/\d{5}
), you haven't met any /
since the beginning of the line.
I would suggest you to simply preppend ^[^\/]*\.
before your current regex.
Thus, the resulting regex would be:
^[^\/]*\.([a-z]{2,4})/\d{5}
How does it work?
^
asserts that this is the beginning of the tested String[^\/]*
accepts any sequence of characters that doesn't contain /
\.([a-z]{2,4})/\d{5}
is the pattern you want to match (a .
followed by 2 to 4 lowercase characters, then a /
and at least 5 digits).Here is a permalink to a working example on regex101.
Cheers!
Upvotes: 1