CADmageren
CADmageren

Reputation: 45

Regex URL pattern except specific subsite

I am working on a webcrawler, where I am trying to make a regex to support the following.

Match: all pages starting with

   http://intranet/

But not starting with

    http://intranet/sites/ and http://intranet/search/

And in the subfolder /Pages/ Ending with .aspx

Valid sample: 
http://intranet/products/Pages/default.aspx
Invalid samples:
http://intranet/Pages/sofus/default.aspx
http://intranet/sites/products/Pages/default.aspx
http://intranet/products/Pages/default.aspx#

So far I have made this

 ^http://intranet.*/Pages/.*.aspx+

Any help appreciated.

Upvotes: 3

Views: 278

Answers (1)

p.s.w.g
p.s.w.g

Reputation: 149040

A pattern like this should work:

^http://intranet/(?!sites|search)[^/]+/Pages/.*\.aspx$

The (?!...) creates what's known as a negative lookahead assertion and ensure that the [^/]+ does not start with sites or search.

Here's a demonstration.

Upvotes: 4

Related Questions