Reputation: 23542
I load HTML from other pages to extract and display data from that page:
$.get('http://example.org/205.html', function (html) {
console.log( $(html).find('#c1034') );
});
That does work but because of the $(html)
my browser tries to load images that are linked in 205.html. Those images do not exist on my domain so I get a lot of 404 errors.
Is there a way to parse the page like $(html)
but without loading the whole page into my browser?
Upvotes: 18
Views: 6911
Reputation: 941
Actually if you look in the jQuery documentation it says that you can pass the "owner document" as the second argument to $
.
So what we can then do is create a virtual document so that the browser does not automatically load the images present in the supplied HTML:
var ownerDocument = document.implementation.createHTMLDocument('virtual');
$(html, ownerDocument).find('.some-selector');
Upvotes: 18
Reputation: 606
Instead of removing all img elements altogether, you can use the following regex to delete all src attributes instead:
html = html.replace(/src="[^"]*"/ig, "");
Upvotes: 0
Reputation: 1560
Sorry for resuscitating an old question, but this is the first result when searching for how to try to stop parsed html from loading external assets.
I took Nik Ahmad Zainalddin's answer, however there is a weakness in it in that any elements in between <script>
tags get wiped out.
<script>
</script>
Inert text
<script>
</script>
In the above example Inert text
would be removed along with the script tags. I ended up doing the following instead:
html = html.replace(/<\s*(script|iframe)[^>]*>(?:[^<]*<)*?\/\1>/g, "").replace(/(<(\b(img|style|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g, "");
Additionally I added the capability to remove iframe
s.
Hope this helps someone.
Upvotes: 3
Reputation: 709
The following regex replace all occurance of <head>, <link>, <script>, <style>
, including background
and style
attribute from data string returned by ajax load.
html = html.replace(/(<(\b(img|style|script|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g,"");
Test regex: https://regex101.com/r/nB1oP5/1
I wish there is a a better way to work around (other than using regex replace).
Upvotes: 1
Reputation: 39
Using the following way to parse html will load images automatically.
var wrapper = document.createElement('div'),
html = '.....';
wrapper.innerHTML = html;
If use DomParser
to parse html, the images will not be loaded automatically. See https://github.com/panzi/jQuery-Parse-HTML/blob/master/jquery.parsehtml.js for details.
Upvotes: 3
Reputation: 35194
You could either use jQuerys remove()
method to select the image elements
console.log( $(html).find('img').remove().end().find('#c1034') );
or remove then from the HTML string. Something like
console.log( $(html.replace(/<img[^>]*>/g,"")) );
Regarding background images, you could do something like this:
$(html).filter(function() {
return $(this).css('background-image') !== '';
}).remove();
Upvotes: 1
Reputation: 2703
Use regex and remove all <img>
tags
html = html.replace(/<img[^>]*>/g,"");
Upvotes: 17