skulpt
skulpt

Reputation: 527

Get all URLs from an external URL

I'm trying to get all URLs from a page using jQuery to call them later on using $.get(). If they were on the same page as the script is included in, it would be no problem calling something like

var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
    alert(links[i].href);
}

In this case I'd just use alert to check that the links were actually parsed. But how can I do the same thing with an URL that is not the current page? Any help would be appreciated. Maybe I'm missing something ridiculously simple but I am really stumped when it comes to anything JavaScript/JQuery related.

Upvotes: 1

Views: 1872

Answers (3)

varbrad
varbrad

Reputation: 474

You will have to get the other page via an HTTP request ($.get in JQuery achieves this), and then either go about converting that HTML into a DOM that JQuery can then traverse and find the <a> tags for you, or use another method such as a regular expression to find all the links within the returned markup.

edit: Probably don't actually use a regex unless you have a guaranteed HTML format and can guarantee the format of all <a> tags on the page. By this point, it's probably just easier to parse the HTML for real.

Upvotes: 1

Sinha
Sinha

Reputation: 512

Collect the current page URL using window.location.href and then match the same with the href of other "a" tags in the loop

var links = document.getElementsByTagName("a");
var thisHref = window.location.href;
for(var i=0; i<links.length; i++) {
    templink = links[i].href;
    if (templink != thisHref){// if the link is not same with current page URL
        alert(links[i].href);
    }
}

Upvotes: 0

SethWhite
SethWhite

Reputation: 1977

Blatantly copying this answer by Nick Craver (go upvote it), but modifying it for your use case:

$.get("page.html", function(data) {
  var data = $(data);
  var links = data.find('a');
  //do stuff with links
});

Note that this will only work if the page you're hitting is set up for cross-origin request. If it isn't, you'll need to do the same with a Dom-parser from a backend server. Nodejs has some great options there, including jsDom.

Upvotes: 2

Related Questions