Reputation: 19
My skill in coding HTML is slightly above newbie level though my CSS is improving daily so I don't even know if this can be done. Although I have no Python, Php, Ruby, Javascript, Perl, Fortran buzzer! (just want to make sure you're still awake big guy) I am willing to learn. The slice below is the first 970 characters — .003 percent — of the 365,937 characters comprising its one style alone. It is these and other Wall of Advertising Code blocks I aspire to delete:
<style type="text/css">#Ad2, #AdText, #Ad_Top, #Adbanner, #Adfox_Banner, #Ads, #AdvertFieldBottom, #AdvertFieldCenter, #AdvertFieldTop, #Advertisement, #AdvertisingTopLine, #BanHolder28-1, #BannerGBottom, #BannerGCenter, #BannerGIMG, #BannerGTop, #BannerH2Left, #BannerHIMG, #BannerHLeft, #BannerUnderBroChat, #JaboxAdBarOuter, #METABAR_IFRAME, #MarketGidComposite1001, #PopUpWnd, #PopWin, #PopWin_popupsu_notds, #RichBanner_center, #__adIframe, #ad-200, #ad-slides, #ad2, #ad4, #ad7, #adHeadBanner, #adL, #adP, #adWrapper, #ad_help_link, #ad_hide_mask_ad_0, #ad_hide_mask_ad_1, #adbns, #adf_notifiers_wrap, #adsCSS, #advRightBox, #advbroker_place_1, #advbroker_place_10, #advbroker_place_2, #advbroker_place_3, #advbroker_place_4, #advbroker_place_5 { display: none!important; }
#advbroker_place_6, #advbroker_place_7, #advbroker_place_8, #advbroker_place_9, #advertbox, #advertising_floater, #advertisment, #advrich, #advunder-top, #adzerk3, #app-banners, . . .</style>
I frequently save HTML pages for my own private reference and I'd like to know if there are any offline¹ widgets/ apps/ macros/ techniques that I could use to strip
I'd like to keep the visual style of the author's page but remove the bloat and I figure if the towering level of talent on stackoverflow can't find a solution then nobody can. I have rudimentary knowledge of Regular Expressions and with the exception of Notepad++ I am a regular user of the assets below:
Can it be done? Thanks everyone. :)
¹for privacy reasons I'd like to avoid an online service
Upvotes: 1
Views: 1456
Reputation: 19
Okay this is crude, but as Wild Beard mentioned there just isn't an easy way to get rid of this ad crap. Use a fixed-pitch/monospace font and a robust text editor with line numbering options (I did this in Textpad but I'm pretty sure Don Ho's FREE Notepad++ could do this as well).
You should now have a large block of text, left-aligned, and single-spaced
Sorting on the first character you don't want line # 5 to be grouped with line # 50001
What you're doing is grabbing the longest of the advertising lines and isolating them for deletion. Be prepared to do this more than once. And don't sweat getting the document back to its original order. That's why you numbered the lines.
Upvotes: 1
Reputation: 8728
If you find these strange style definitions in the shadow-root of your browser: This CSS is dynamically added to each website by Adguard Adblocker. The tool sets all kinds of "#banner..." or "#ad..." etc to "display:none !important".
https://chrome.google.com/webstore/detail/adguard-adblocker/
Upvotes: 0
Reputation: 2927
Here is a simple proof of concept. You'll still need to determine the read/write to file after removing the elements etc or styles. Fiddle
However, like I mentioned in my comment, this will match #additional-info
as well. I did add a check to see if the element was an iframe
which should narrow down errors a bit.
var matched_classes = [],
regex = /(#ad)\w+/gmi,
style = document.querySelectorAll('style');
style.forEach(function(item) {
matched_classes = item.innerHTML.match(regex);
});
matched_classes.forEach(function(item) {
var el = document.getElementById(item.replace('#', ''));
if ( el != null && el.nodeName === 'IFRAME' ) {
el.parentElement.removeChild(el);
}
});
<style type="text/css">#Ad2, #AdText, #Ad_Top, #Adbanner</style>
<iframe id="Ad2" src="https://www.w3schools.com">
</iframe>
<div id="AdText">Something not removed hopefully.</div>
As you mentioned in your comment you have no idea how to implement this. There's no simple and easy way to do it. You can get started here on how to create files with javascript but Javascript likely isn't going to be your best bet. From your list of languages in the question Python may be your best bet, sadly, I don't know Python.
You could copy this code I've created and paste it into the bottom of your files, open the file in your browser, view source, copy, and save the new file as it should remove any iframe
element with a matching id
from a <style>
tag. That's a bit tedious. But for someone who doesn't have any experience that may be your best place to start - you know short of writing out the entire solution for you.
<script>
var matched_classes = [],
regex = /(#ad)\w+/gmi,
style = document.querySelectorAll('style');
style.forEach(function(item) {
matched_classes = item.innerHTML.match(regex);
});
matched_classes.forEach(function(item) {
var el = document.getElementById(item.replace('#', ''));
if ( el != null && el.nodeName === 'IFRAME' ) {
el.parentElement.removeChild(el);
}
});
</script>
Upvotes: 0