tomermes
tomermes

Reputation: 23360

viewing actual source code of a website

I'll explain my question with an example. Suggest I go the the url: http://www.google.co.il/#q=university

and then I right click and choose "view source", I don't get the real html source, I'm sure of that because if I search in the code unique words that appear in the document I get no results.

I know that in chrome I can mark something and check the component, then I can see the real source code, but I want to use a java program for getting the code so I want to understand the issue of why I don't see the real html source when I go to 'view source'.

Upvotes: 4

Views: 20837

Answers (8)

Timothy Chen
Timothy Chen

Reputation: 451

You could do something like document.documentElement which gives all the HTML content.

console.log(document.documentElement);

Upvotes: 1

Azodious
Azodious

Reputation: 13872

What word did you search?

I guess view source will show the complete HTML code, even that part which is not visible on the page. try to search again after trimming the search string. and search same string in chrome also, what you tried earlier.

Plus, it will not be updated if JS changes HTML after onload event completes.

Upvotes: 0

gibffe
gibffe

Reputation: 835

In the example page you gave, each result element is generated by the JS script function from one of the files loaded; moreover, it does not render text with plain characters but with Unicode instead.

Upvotes: 0

mrembisz
mrembisz

Reputation: 12870

The only way I know to see the actual source in Java, including javascript made modification would be through a virtual browser framework, like HtmlUnit.

HtmlUnit can execute JS scripts and apply all changes to the DOM tree. You would have to serialize it to get the actual page. Keep in mind there is no such thing as "complete html source". You can only get DOM tree and possibly serialize it.

Upvotes: 2

MrHelper
MrHelper

Reputation: 26

Well, if you select "view source" you see the actual HTML source code of the page in your address bar. However, it might be that the page(s) you want to view are "obfuscated" by having embedded code which loads external content and puts it in your HTML.

If you still want to automatically parse such a page in a "nice" you need to run a whole HTML interpreter like for example Webkit - a hell of work, and in principle what you are doing with "inspect element". The other way is that you find the lines in the page-html that load the external contents and then in turn load them on your own. If you are lucky this is not obfuscated on purpose and kind of easy to achive for small tasks.

However, if you need the whole DOM structure, you should think about implementing one of the browser engines...

Upvotes: 1

yatskevich
yatskevich

Reputation: 2093

"View source" gives you a pure response generated by a server. As Joachim Isaksson has already mentioned - use Chrome or Firebug for Firefox.

Upvotes: 0

Jeremiah Orr
Jeremiah Orr

Reputation: 2630

The text you're looking for could have been rendered from JavaScript. If you're using Chrome (since you mentioned it), the web developer pane that comes up when you do "inspect element" has a "Resources" tab that lists JavaScript files, stylesheets, etc.

Upvotes: 0

Joachim Isaksson
Joachim Isaksson

Reputation: 180867

View source usually does not show any javascript generated content, for seeing that you'll want to use a plugin as for example firebug.

Upvotes: 2

Related Questions