CHawk
CHawk

Reputation: 1356

jQuery/PHP - Grabbing all links from an external page

I am trying to make a program that grabs all the links from an external website and display them using jQuery and PHP. Here are my steps:

  1. Get the html of a page using php (load.php)
  2. Put that html into a div
  3. Get all elements in that div

Here is my code (index.html):

<html>
<head>
    <title>Test</title>
    <script type="text/javascript" src="jquery.js">//jquery</script>
    <script type="text/javascript">
        $(function() { //on load
            var url = "http://google.com";
            $.post('load.php', { url: url},
                function(html) {
                    $('#page').html(html); //loads html from the page into a div

                    var links = $('#page > a');
                    alert('links.length: ' + links.length); //PROBLEM: returns 0 
                    for(var i=0; i < links.length; i++)
                    {
                        alert(links[i]);
                    }
            });
        });
    </script>
</head>
<body>
<div id="page" style=""></div>
</body>
</html>

And the php code (load.php):

<?php
$url = $_POST['url'];
$html = file_get_contents($url);
echo $html;
?>

The page is being loaded into the div correctly, so I know it is grabbing the html, but links.length is returning 0. So it is something wrong with this line:

var links = $('#page > a');

However, when I try to load it on my test page with html:

<a href="http://google.com">link1</a>
<a href="http://yahoo.com">link2</a>

links.length returns 2. Why does it work with my test page and not google?

Upvotes: 0

Views: 963

Answers (3)

yaka
yaka

Reputation: 914

Along other things to consider (like what roman mentioned), If you want to find all the anchors, try this:

$('#page a');
// OR
$('#page').find('a');


Note: parent > child selects all "direct" child elements.

Upvotes: 2

optimusprime619
optimusprime619

Reputation: 764

@CHawk for some reason it appears to me that when you set the scraped content from the source as content of div page would be treated as the text inside the div rather than a bunch of html elements...but i'm baffled that it works with the test page... would suggest you to try pull out any other html element thats been scraped to confirm this... would suggest some of the other variant options like html dom parser or using regex to get the content... let us know how it works out.. Cheers!!

Upvotes: 0

Roman
Roman

Reputation: 6428

probably because your test page contains a document fragment (only the 2 links) while a page like google contains a whole document (starting with a doctype declaration and <html> and so on...).

inserting such html into a div element probably breaks your DOM.

I'd advise to

  1. parse the HTML serverside and pass only the results to your JS app.
    OR
  2. load the page (from your server) in an iframe and access it's document to get to its link collection (documentOfIframe.links)

Upvotes: 2

Related Questions