BCsongor
BCsongor

Reputation: 869

Regex ISSUE - can't match a URL that ends with nothing

Hey guys. I', trying to have a regex made up to match all URL's like these:

  1. http://example.com
  2. http://example.com/
  3. http://example.com/index.html
  4. http://example.com/index
  5. http://example.com/index/
  6. http://www.example.com
  7. http://www.example.com/
  8. http://www.example.com/index.html
  9. http://www.example.com/index
  10. http://www.example.com/index/

and to match URLs that have ' # ' or ' ? ' only until the character before those 2 ones. This way http://example.com/index.php?p=Hey -> http://example.com/index.php

The regex code I have so far works well when selecting only certain file types or a folder except one case:

Any help is appreciated. Thanks everyone.


This is the regex:

^(?<protocol>http(s?))://(?<domain>[^/\r\n#?]+)(?<path>/[^?#]*(?:html|php|/))?

Upvotes: 1

Views: 322

Answers (2)

morja
morja

Reputation: 8560

This might do what you want:

^(?<protocol>http(s?))://(?<domain>[^/\s#?]+)(?<path>/[^\s#?]*)?(?<query>.*)?

The query will contain the rest that you might want to ignore.

Upvotes: 1

Town
Town

Reputation: 14906

Not sure what language you're using, but regular expressions may not be necessary for this if you've got a list of URLs already.

In C#, you could do something like this:

string a = "http://example.com/index.php?p=Hey";
string b = a.Remove(a.IndexOfAny(new char[] {'?', '#'}, 0));

Upvotes: 1

Related Questions