azizk
azizk

Reputation: 13

javascript regex match url

I want to get urls from a bing search. I get the html, and when I do this regex /<h2><a href="(.*?)"/g it gives me :

["<h2><a href="https://www.test.com/"", "<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"", "<h2><a href="http://www.speedtest.net/"", "<h2><a href="http://test.psychologies.com/"", "<h2><a href="http://www.thefreedictionary.com/test"", "<h2><a href="http://fr.wikipedia.org/wiki/Test"", "<h2><a href="http://www.wordreference.com/enfr/test"", "<h2><a href="http://www.sedecouvrir.fr/"", "<h2><a href="http://www.jeuxvideo.com/tests.htm"", "<h2><a href="http://en.wikipedia.org/wiki/Test""]

For js code, I used match

html.match(/<h2><a href="(.*?)"/g);

I only want the urls. The html is here: http://www.bing.com/search?q=test. I've already searched the whole day, and I think maybe I have to use group?

Upvotes: 1

Views: 221

Answers (3)

Kelsadita
Kelsadita

Reputation: 1038

Use Array.map to iterate over the list of html elements and then execute a given regular expression to get the link using group.

"use strict";

var links = ['<h2><a href="https://www.test.com/"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"', 
 '<h2><a href="http://www.speedtest.net/"', 
 '<h2><a href="http://test.psychologies.com/"',
 '<h2><a href="http://www.thefreedictionary.com/test"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test"',
 '<h2><a href="http://www.wordreference.com/enfr/test"',
 '<h2><a href="http://www.sedecouvrir.fr/"',
 '<h2><a href="http://www.jeuxvideo.com/tests.htm"',
 '<h2><a href="http://en.wikipedia.org/wiki/Test"'];

var result = links.map(function (link) {
  return /<h2><a href="(.*?)"/.exec(link)[1];
});

console.log(result);

Upvotes: 1

JayC
JayC

Reputation: 7141

If this is being done within a browser, there's really no need to try to use a regex.

var myNodeList= document.getElementsByTagName('a'); 
var i;
for (var i = 0; i < myNodeList.length; ++i) {
    var anchor = myNodeList[i];  
    console.debug(anchor.href);
}

But as hinted in the comments, if you really want to use regexes, all you need to do is iterate over the results like you see in How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? In particular, note the lines:

while (match = re.exec(url)) {
     params[decode(match[1])] = decode(match[2]);
}

Upvotes: 0

Amit Joki
Amit Joki

Reputation: 59232

That is an array. You need something like this. Also you need groups.

var urls = html.map(function(str){
   return str.replace(/.*href="([^"]+).*/, "$1");
});

Upvotes: 0

Related Questions