sgtz
sgtz

Reputation: 9019

string to parse out a URL

Got this regex string from "JavaScript: the good parts" (pp. 66). Can't get it to work. Can anyone see what is wrong with it?

/^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/

it's supposed to split up a string like this:

https://stackoverflow.com/questions/ask

into constituents: scheme, slash, host, port, path, query, hash

btw: this regex needs to be generic... it's going to be used on different "schemes"

Upvotes: 0

Views: 389

Answers (4)

Petr Behenský
Petr Behenský

Reputation: 620

I really don't know, what is the meaning of all parts of regex, but the last # character should be escaped by backslash.

/^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:\#(.*))?$/

Upvotes: 0

Neo
Neo

Reputation: 2405

If this is in Javascript try

result = subject.match(/\b(https?|ftp):\/\/([\-A-Z0-9.]+)(\/[\-A-Z0-9+&@#\/%=~_|!:,.;]*)?(\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?/ig);

Upvotes: 0

rejj
rejj

Reputation: 1216

your question is tagged with c#, so why don't you just use the System.Uri class?

eg

string s = "http://stackoverflow.com/questions/ask";
Uri uri = new System.Uri(s);

string scheme = uri.Scheme;
string host = uri.DnsSafeHost;
// etc

Upvotes: 1

Matías Fidemraizer
Matías Fidemraizer

Reputation: 64943

Maybe this isn't your goal, but why don't you use System.Uri class?

It has what you want and it parses raw URI/URL(s).

http://msdn.microsoft.com/en-us/library/system.uri.aspx

Upvotes: 4

Related Questions