el_pup_le
el_pup_le

Reputation: 12179

URL extraction from string

I found a regular expression that is suppose to capture URLs but it doesn't capture some URLs.

$("#links").change(function() {

    //var matches = new array();
    var linksStr = $("#links").val();
    var pattern = new RegExp("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$","g");
    var matches = linksStr.match(pattern);

    for(var i = 0; i < matches.length; i++) {
      alert(matches[i]);
    }

})

It doesn't capture this url (I need it to):

http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar

But it captures this

http://www.wupload.com

Upvotes: 1

Views: 717

Answers (3)

Brock Adams
Brock Adams

Reputation: 93443

Several things:

  1. The main reason it didn't work, is when passing strings to RegExp(), you need to slashify the slashes. So this:

    "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$"
    

    Should be:

    "^(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
    


  2. Next, you said that FF reported, "Regular expression too complex". This suggests that linksStr is several lines of URL candidates.
    Therefore, you also need to pass the m flag to RegExp().

  3. The existing regex is blocking legitimate values, eg: "HTTP://STACKOVERFLOW.COM". So, also use the i flag with RegExp().

  4. Whitespace always creeps in, especially in multiline values. Use a leading \s* and $.trim() to deal with it.

  5. Relative links, eg /file/63075291/LlMlTL355-EN6-SU8S.rar are not allowed?

Putting it all together (except for item 5), it becomes:

var linksStr    = "http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar  \n"
                + "  http://XXXupload.co.uk/fun.exe \n "
                + " WWW.Yupload.mil ";
var pattern     = new RegExp (
                    "^\\s*(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
                    , "img"
                );

var matches     = linksStr.match(pattern);
for (var J = 0, L = matches.length;  J < L;  J++) {
    console.log ( $.trim (matches[J]) );
}

Which yields:

http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
http://XXXupload.co.uk/fun.exe
WWW.Yupload.mil

Upvotes: 1

Senad Meškin
Senad Meškin

Reputation: 13756

(https?\:\/\/)([a-z\/\.0-9A-Z_-\%\&\=]*)

this will locate any url in text

Upvotes: 0

The Mask
The Mask

Reputation: 17427

Why not do make: URLS = str.match(/https?:[^\s]+/ig);

Upvotes: 0

Related Questions