Reputation: 13
I want to get urls from a bing search. I get the html, and when I do this regex
/<h2><a href="(.*?)"/g
it gives me :
["<h2><a href="https://www.test.com/"", "<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"", "<h2><a href="http://www.speedtest.net/"", "<h2><a href="http://test.psychologies.com/"", "<h2><a href="http://www.thefreedictionary.com/test"", "<h2><a href="http://fr.wikipedia.org/wiki/Test"", "<h2><a href="http://www.wordreference.com/enfr/test"", "<h2><a href="http://www.sedecouvrir.fr/"", "<h2><a href="http://www.jeuxvideo.com/tests.htm"", "<h2><a href="http://en.wikipedia.org/wiki/Test""]
For js code, I used match
html.match(/<h2><a href="(.*?)"/g);
I only want the urls. The html is here: http://www.bing.com/search?q=test. I've already searched the whole day, and I think maybe I have to use group?
Upvotes: 1
Views: 221
Reputation: 1038
Use Array.map to iterate over the list of html elements and then execute a given regular expression to get the link using group.
"use strict";
var links = ['<h2><a href="https://www.test.com/"',
'<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"',
'<h2><a href="http://www.speedtest.net/"',
'<h2><a href="http://test.psychologies.com/"',
'<h2><a href="http://www.thefreedictionary.com/test"',
'<h2><a href="http://fr.wikipedia.org/wiki/Test"',
'<h2><a href="http://www.wordreference.com/enfr/test"',
'<h2><a href="http://www.sedecouvrir.fr/"',
'<h2><a href="http://www.jeuxvideo.com/tests.htm"',
'<h2><a href="http://en.wikipedia.org/wiki/Test"'];
var result = links.map(function (link) {
return /<h2><a href="(.*?)"/.exec(link)[1];
});
console.log(result);
Upvotes: 1
Reputation: 7141
If this is being done within a browser, there's really no need to try to use a regex.
var myNodeList= document.getElementsByTagName('a');
var i;
for (var i = 0; i < myNodeList.length; ++i) {
var anchor = myNodeList[i];
console.debug(anchor.href);
}
But as hinted in the comments, if you really want to use regexes, all you need to do is iterate over the results like you see in How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? In particular, note the lines:
while (match = re.exec(url)) {
params[decode(match[1])] = decode(match[2]);
}
Upvotes: 0
Reputation: 59232
That is an array. You need something like this. Also you need groups.
var urls = html.map(function(str){
return str.replace(/.*href="([^"]+).*/, "$1");
});
Upvotes: 0