Reputation: 31
After some hours of trying and reading, I'm a bit lost about the title subject.
My problem : I am trying to get the full HTML content (javascript HTML appended/added content) of a single web page. What I have already try :
So now, the question is, how can I imitate the "save as" function of a browser or how can I, in general, get the full HTML content first AND then use Jsoup to scan the static final HTML content ?
Thanks a lot for your advise and your help !
Upvotes: 0
Views: 5114
Reputation: 31
I finally get what i wanted to. I will try to explain for thoose who need some help!
So ! The process is composed by two steps :
1 - Get HTML content and save it
For this step, you will need to download phantomjs and use it to get the content. Here is the code to get the target page. Just change myTargetedPage.com by the URL of the page you want to get and the name of the file mySaveFile.html.
var page = require('webpage').create();
var fs = require('fs');
page.open('http://myTargetedPage.com', function () {
page.evaluate();
fs.write('mySaveFile.html', page.content, 'w');
phantom.exit();
});
As you can see, the file saved is exactly the same as the content load in your browser.
2 - Extract the content you wanted
Now, we will use Java and the library Jsoup to get or specific content. in my example, I want to get this part of the web page :
/* HTML CONTENT */
<span class="my class" data="data1"></span>
/* HTML CONTENT */
<span class="my class" data="data2"></span>
/* HTML CONTENT */
To get this, this code will be fine (don't forget to edit thePathToYourSavedFile.html :
public static void main(String[] args) throws Exception {
String url = "thePathToYourSavedFile.html";
Document document = Jsoup.connect(url).userAgent("Mozilla").get();
Elements spanList= document.select("span");
for (Element span: spanList) {
if(span.attr("class").equals("my class")){
String data = span.attr("data");
System.out.println("data : "+data);
}
}
}
Enjoy !
Upvotes: 2
Reputation: 1955
There is a nice plugin that gives you what you are looking for. It offers a way to see a page and it's functionality. It is available for some of the browsers but not all. Here is the link : http://chrispederick.com/work/web-developer/
P.S. after you install it, there is a little gear on the toolbar located at the top right. That is where all the functions is at.
Upvotes: 0