Reputation: 175
I have a page with some html rendered into it. I want to get the rendered page as text, but somehow also include the newlines. In addition, if relevant, I'm looking for an extended solution that will also support lists (using spaces and •), tables (using spaces, but with no borders) and similar cases.
I'm looking for Javascript solution, either on client or server side.
Please mind: not every element in the page equals to new line (e.g: some divs can be inline and some can create new lines).
For exapmle, this snippet below will be the html, and the output will be the text itself as you can see below (after running).
#inline{
display:flex;
flex-direction:row;
}
#inline div{
margin-right:5px;
}
#notInline{
display:flex;
flex-direction:column;
}
<div>
<div id='inline'><div>some</div><div>divs</div><div>inline</div></div>
<div id='notInline'><div>some</div><div>divs</div><div>on top of each other</div>
Upvotes: 0
Views: 222
Reputation: 14165
You can try this. First inline text second "on top of each other" text:
var inlineOutput = '';
document.querySelector('#inline').childNodes.forEach(e=>{inlineOutput += e.textContent + ' '}) + "\n";
console.log(inlineOutput);
var noInLineOutput = '';
document.querySelector('#notInline').childNodes.forEach(e=>{noInLineOutput += e.textContent + " \n"});
console.log(noInLineOutput);
Upvotes: 1
Reputation: 485
There's a js scraper called Cheerio that could extract all the text out for you, I've never used it though. It gives you access to the DOM and you can gather parts of whichever page you need. here's a tutorial that uses it with node.
Not sure if this is what you're looking for, if they're your own pages you can probably make a function that calls everything in the dom and delimits at the open close carats and grabs in the text inbetween, and maybe make a switch if it sees the notInLine class
Upvotes: 0