Franz
Franz

Reputation: 11553

Firefox extensions & XUL: get page source code

I am developing my first Firefox extension and for that I need to get the complete source code of the current page. How can I do that with XUL?

Upvotes: 12

Views: 7293

Answers (6)

Daniel Gerson
Daniel Gerson

Reputation: 2209

More in line with Lachlan's answer, but there is a discussion of the internals here that gets quite in depth, going into the Cpp code.

http://www.mail-archive.com/[email protected]/msg05391.html

and then follow the replies at the bottom.

Upvotes: 0

Lachlan Roche
Lachlan Roche

Reputation: 25966

You will need a xul browser object to load the content into.

Load the "view-source:" version of your page into a the browser object, in the same way as the "View Page Source" menu does. See function viewSource() in chrome://global/content/viewSource.js. That function can load from cache, or not.

Once the content is loaded, the original source is given by:

var source = browser.contentDocument.getElementById('viewsource').textContent;

Serialize a DOM Document
This method will not get the original source, but may be useful to some readers.

You can serialize the document object to a string. See Serializing DOM trees to strings in the MDC. You may need to use the alternate method of instantiation in your extension.

That article talks about XML documents, but it also works on any HTML DOMDocument.

var serializer = new XMLSerializer();
var source = serializer.serializeToString(document);

This even works in a web page or the firebug console.

Upvotes: 6

Eli Grey
Eli Grey

Reputation: 35913

The first part of Sagi's answer, but use document.getElementById('viewsource').textContent instead.

Upvotes: 0

Sagi
Sagi

Reputation: 8011

You can get URL with var URL = document.location.href and navigate to "view-source:"+URL.

Now you can fetch the whole source code (viewsource is the id of the body):

var code = document.getElementById('viewsource').innerHTML;

Problem is that the source code is formatted. So you have to run strip_tags() and htmlspecialchars_decode() to fix it.

For example, line 1 should be the doctype and line 2 should look like:

&lt;<span class="start-tag">HTML</span>&gt;

So after strip_tags() it becomes:

&lt;HTML&gt;

And after htmlspecialchars_decode() we finally get expected result:

<HTML>

The code doesn't pass to DOM parser so you can view invalid HTML too.

Upvotes: 2

Phil Rykoff
Phil Rykoff

Reputation: 12087

really looks like there is no way to get "all the sourcecode". You may use

document.documentElement.innerHTML

to get the innerHTML of the top element (usually html). If you have a php error message like

<h3>fatal error</h3>
segfault

<html>
    <head>
        <title>bla</title>
        <script type="text/javascript">
            alert(document.documentElement.innerHTML);
        </script>
    </head>
    <body>
    </body>
</html>

the innerHTML would be

<head>
<title>bla</title></head><body><h3>fatal error</h3>
segfault    
        <script type="text/javascript">
            alert(document.documentElement.innerHTML);
        </script></body>

but the error message would still retain

edit: documentElement is described here: https://developer.mozilla.org/en/DOM/document.documentElement

Upvotes: 2

Manuel Bitto
Manuel Bitto

Reputation: 5263

Maybe you can get it via DOM, using

var source =document.getElementsByTagName("html");

and fetch the source using DOMParser

https://developer.mozilla.org/En/DOMParser

Upvotes: 1

Related Questions