Reputation: 869
Hey guys. I', trying to have a regex made up to match all URL's like these:
and to match URLs that have ' # ' or ' ? ' only until the character before those 2 ones. This way http://example.com/index.php?p=Hey -> http://example.com/index.php
The regex code I have so far works well when selecting only certain file types or a folder except one case:
Any help is appreciated. Thanks everyone.
This is the regex:
^(?<protocol>http(s?))://(?<domain>[^/\r\n#?]+)(?<path>/[^?#]*(?:html|php|/))?
Upvotes: 1
Views: 322
Reputation: 8560
This might do what you want:
^(?<protocol>http(s?))://(?<domain>[^/\s#?]+)(?<path>/[^\s#?]*)?(?<query>.*)?
The query will contain the rest that you might want to ignore.
Upvotes: 1
Reputation: 14906
Not sure what language you're using, but regular expressions may not be necessary for this if you've got a list of URLs already.
In C#, you could do something like this:
string a = "http://example.com/index.php?p=Hey";
string b = a.Remove(a.IndexOfAny(new char[] {'?', '#'}, 0));
Upvotes: 1