How to get regex of URL that does not have a word as a token?

Question

How do I match an URL that matches all of these conditions:

the domain is example.com, but subdomain is not blog.example.com
the first URL token is not "news" or "archives" or "blog" (ie example.com/FIRST_URL_TOKEN)
none of the subsequent URL tokens is "blog" (ie example.com/FIRST_URL_TOKEN/SUBSEQUENT_URL_TOKEN/SUBSEQUENT_URL_TOKEN)

So:

http://example.com/test should match

http://blog.example.com/test should not match

http://example.com/test/blog/test should not match

http://example.com/test/test2 should match

Here is what I have so far:

regex = /^http(s)?:\/\/(?!blog\.$)example.com(\.\w+)?\/(?!news$|archive$|blog$).*/

However, I'm missing something as http://example.com/test/blog/test should not match.

ndnenkov · Accepted Answer

%r{^https?://[^/]*(?



See it in action


There were quite some problems with your original regex. Mainly, $ doesn't mean what I think you means and you were not excluding blog/.

So here is a breakdown:


There is an alternative syntax for creating regexes %r{}, use it if you are going to escape forward slashes a lot
^ -from the start
https?// - http// or https//
[^/]* - multiple characters, which are not forward slashes
(? - negative lookbehind to ensure the subdomain was not blog.example.com

example\.com - the example.com domain itself
/(?!news/|archives/|blog/) - after first slash, the "url token" is not news or archives or blog
(?!.*/blog(/|$)) - any of the further "url tokens" are not blog
.* - match the remaining characters

How to get regex of URL that does not have a word as a token?

Answers (2)

Related Questions