Peter
Peter

Reputation: 956

jQuery load body of a page

I am trying to load a page's body just like here: jQuery: Load body of page into variable.

However, in this thread no one provided a working solution because $.load() cuts off the <!DOCTYPE>, <html> and <body> tag by default (afaik). I chose the $.get() method and I already got the page's entire content as a string, but now I am not able to get just the <body> tag (or rather: what's inside the <body> tag).

So far I have tried:

$.get(uri, function(data){
console.log(data); // --> the entire page's content is logged
});

$.get(uri, function(data){
console.log($(data)); // --> i guess that's the entire site as an object
});

$.get(uri, function(data){
console.log($(data).find("body")); // --> this should be the <body> tag as an object, but console just outputs "[ ]"
});

Upvotes: 3

Views: 3764

Answers (4)

Paul Grime
Paul Grime

Reputation: 15104

jQuery will trim off the html and body tags. For example in Firebug:

$("<html><body><div id=id000><div id=id001>content</div></div></body></html>")

results in:

[div#id000]

and clicking on that in the Firebug console shows this:

<div id="id000">
    <div id="id001">content</div>
</div>

So you shouldn't need to find the body tag yourself, as the only content left will be that which was inside the original body tag.

EDIT BASED ON COMMENT:

Maybe some simple parsing is required beforehand to remove the <head> element. The following assumes you are only interested in the content that follows a <body> tag.

// try and find the body start tag
var match = /<body/gi.exec(loadedContent);
if (match.length > 0) {
    // if found, trim the loadedContent
    loadedContent = loadedContent.substring(match.index);
}
// jQuery will do the rest
var $content = $(loadedContent);

for loadedContent as:

<html><head><title>title</title></head><body><div id=id000><div id=id001>content</div></div></body></html>

this gives the same <div> elements as above, i.e. the <title> tag is not used.

Upvotes: 1

ShadowScripter
ShadowScripter

Reputation: 7369

Explanation

Hm.. let's see if I can properly demonstrate this.

$.get() is a shorthand for $.ajax().

So when you do this

$.get(uri, function(data){
    console.log(data); // --> the entire page's content is logged
});

You're really doing this

$.ajax({
    url: uri,
    type: "GET",
    success: function(msg){
        console.log(msg);
    }
});

And by default, it returns the page as HTML. Or rather, by default, it first checks the MIME-type on the page, and if none is found, it returns HTML. If you want to tell it what you would like to return, you can either do it in the MIME-type on the server page, or you could use $.getJSON()

If you want the data returned from your request in form of an object, JSON is the way to go. The only real difference in the code, really, is

  • replace your $.get() with $.getJSON()

    $.getJSON(uri, function(data){
        console.log(JSON.stringify(data));
    });
    

or

  • add dataType: "json" in the $.ajax()

    $.ajax({
        url: uri,
        type: "GET",
        dataType: "json",
        success: function(data){
            console.log(JSON.stringify(data));
        }
    });
    

so it can expect JSON data to be returned from the page.

Now all you need to do is prepare the data on the server side, using json_encode()

$output = array(
    "msg" => "This is output", 
    "data" => array(
        "info" => "Spaaaace", 
        "cake" => "no"
    ), 
    array(
        "foo", 
        "bar"
    )
);
echo json_encode($output); 
//it will look like this before the text is parsed into JSON in Javascript
//{"msg":"This is output","data":{"info":"Spaaaace","cake":"no"},"0":["foo","bar"]}

This is the way to go if you want objects returned from a request.


Solution

Apart from server-side fix with the json_encode(), this is the solution.

$.getJSON(uri, function(data){
    console.log(JSON.stringify(data)); 
});

Alternative solution

Assuming you want to keep your $.get() All you need is the text between <body> and </body> Here's an example

$.get(uri, function(msg){
    var startWith = "<body>",
        endWith = "</body>";
    var iStart = msg.search(startWith);
    var iEnd = msg.search(endWith);

    msg= msg.substring(iStart+startWith.length, iEnd)
    console.log(msg);
});

And here's a more advanced answer on that one.

Upvotes: 4

Jules
Jules

Reputation: 7223

Did you try ?

$.get(uri, function(data) {

   console.log('<body>' + data.contents().find('html body').html() + '</body>');

});

Upvotes: 0

gen_Eric
gen_Eric

Reputation: 227270

You can try reading the HTML data as XML instead.

$.get(uri, function(data){
   console.log($(data).find("body"));
}, 'xml');

Upvotes: 0

Related Questions