Reputation: 4441
I'm working on a web based applcation, which loads the HTML content of an URL using the call made to http://www.whateverorigin.org/ This avoids the same origin policy violation
url = 'http://' + document.getElementById("urlText").value
$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent(url) + '&callback=?', function(data){
var doc = new DOMParser().parseFromString(data.contents, 'text/html');
If I would need to extract the meaningful visible text from this html string, is there a way that I can do this like how beautifulsoup would do in python? I'm more a beginner to javascript.
Upvotes: 2
Views: 154
Reputation: 119
Use jQuery in order to find and iterate over the appropriate elements. Then you can decide what to print out - for example: show the text-node of visible items. Here is a jsfiddle with a working script example: http://jsfiddle.net/w147o9f6/1/
<body>
<div id="outputTexts">OUTPUT:</div>
</body>
javascript:
var parser = new DOMParser();
var doc;
var meaningfulTexts = [];
$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent('https://www.facebook.com') + '&callback=?', function(data){
doc = parser.parseFromString(data.contents, "text/html");
var ELMS = $(doc).find("div, p, a, span");
ELMS.each(function(index, element) {
if(element.style.display != "none" && $(element).text() != "") {
$("#outputTexts").append('<br>'+ element.tagName + ' - '+$(element).text());
meaningfulTexts.push( $(element).text() );
}
});
});
Upvotes: 1
Reputation: 15
It looks like this is what you need? The code below parses google.nl with the whateverorigin.org website and adds it to a div. If not, please try to explain what more you need!
jQuery:
$(document).ready(function() {
$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent('http://www.google.nl') + '&callback=?', function(data){
$('.result').html(data.contents);
});
});
HTML:
<div class="result"></div>
Example: http://jsfiddle.net/qddekhnc/1/
Upvotes: 0