Reputation: 1133
Ok, this is kind of trick. I have this text:
<something>
<h1> quoiwuqoiuwoi aoiuoisquiooi
<script> dsadsa dsa </script>
Some text here in the middle! =)
<script> dsadsa dsa </script>
</h1>
</something>
I want to get only the content in without the tags, in other words:
<h1> quoiwuqoiuwoi aoiuoisquiooi
Some text here in the middle! =)
</h1>
Including the tags.
Doing some research I´ve found out I can get everything between the h1 tags with the following regex:
/<h1([^]*)h1>/
How ever, I can´t find a way to exclude whats bettween the tags. Including the script tag itself. Any help would be much apreciated.
In case anyone is wondering why I need that, here is a brief explanation:
I´m using this code to scrapy some data from a site using googleSpreadSheet:
function doGet() {
var html = UrlFetchApp.fetch('https://www.nespresso.com/br/pt/product/maquina-de-cafe-espresso-pixie-clips-c60-preta-e-lima-neon-110v').getContentText();
var regExp = new RegExp("<h1([^]*)h1>", "gi");
var h1 = regExp.exec(html);
Logger.log(h1);
var doc = XmlService.parse(h1[0]);
var html = doc.getRootElement();
var menu = getElementsByClassName(html, 'nes_pdp-title nes_pdp-title-sep-none')[0];
var output = menu.getText();
Logger.log(output);
}
How ever it has a problem parssing script tags and iframes. the only solution I could find was to strip the code from them. If anyone has a better solution, I all ears.
If I don´t remove the script and iframe tags, the code breaks before I could call the .getElementsByTagName. It breaks when I use .XmlService(). I can only pass a valeu to XmlSevive() if it does not have a javascript nor a iframe tag. Thank You again!
Upvotes: 0
Views: 142
Reputation: 1
Try replacing .innerHTML
of h1
element using String.prototype.replace()
with RegExp
/<script>.*<\/script>/g
to match script
tags including text within script
tags , .trim()
var h1 = document.getElementsByTagName("something")[0].querySelector("h1");
h1.innerHTML = h1.innerHTML.replace(/<script>.*<\/script>/g,"")
.trim();
console.log(h1.outerHTML)
<something>
<h1> quoiwuqoiuwoi aoiuoisquiooi
<script> dsadsa dsa </script>
Some text here in the middle! =)
<script> dsadsa dsa </script>
</h1>
</something>
Upvotes: 2