Shailen Sukul
Shailen Sukul

Reputation: 510

Regex Expression for a URL Whitelist

This has been driving me crazy.

I need to construct a single regex expression of a whitelist of urls to allow my site to link to. They should be of the form:

*.microsoft.com/*

So the following urls are valid:

http://digital.microsoft.com/audio/somefile.wmv
http://sharepoint.microsoft.com/pages/p1

And the following invalid:

http://badsite.microsoft.com.me
http://www.microsoft.com.me/runthis

I need a regex expression which will allow valid microsoft sites to be linked to, but block malicious sites which my submit links with the words microsoft.com in them.

Any help is appreciated!

UPDATE

Based on the answer by @ruakh, I was able to tweak the expression to match my scenario: I will mark his post as the answer.

Expression: ^([a-z|A-Z])+?://([^/]+[.])?(microsoft[.]com|MICROSOFT[.]COM)?(/.*)?$

This expression correctly matches the following:

And correctly does not match the following:

Upvotes: 3

Views: 7245

Answers (2)

Alexey Kazakov
Alexey Kazakov

Reputation: 351

A bit more sophisticated regex: ^([a-z|A-Z])+?://([^/?#]+[.])?(microsoft[.]com|MICROSOFT[.]COM)?(/.*)?$

if you also don't want to match:

http://go.something.com?go.microsoft.com
http://go.something.com?param=go.microsoft.com
http://go.something.com#go.microsoft.com

Upvotes: 2

ruakh
ruakh

Reputation: 183371

I think it would be better to use a URL-parsing library, but since you say you need "a single regex expression" (emphasis mine), I take it that, for some externally-driven reason, you really need to do this in a regex? In that case, I'd probably write something like:

^(https?|mms)://([^/]+[.])?(?i:microsoft[.]com)(/.*)?$

Upvotes: 2

Related Questions