Ethan
Ethan

Reputation: 375

How to normalize the html document from webpage in node.js environment, such as tbody?

I have a page downloaded from a website page, which have a table tag without tbody, but I can see the tbody tag in chrome browser. Obviously the html document have been normalized by chrome. I want to normalize the web page document using some npm package to have the same effect as chrome.

Which npm package can do it? thanks.

Upvotes: 0

Views: 127

Answers (1)

Maverick
Maverick

Reputation: 886

As far as I know there isn't a tool like this and for a reason.

Because the "normalization" you are talking about isn't mandatory for HTML to be valid. thead, tbody, tfoot are not required.

But why do browsers do it?

It is added because it is needed to build the DOM tree.

Here is how the parser works:

8.2.5.4.9 The "in table" insertion mode

A start tag whose tag name is one of: "td", "th", "tr"

Insert an HTML element for a "tbody" start tag token with no attributes, then switch the insertion mode to "in table body".

More here: https://www.w3.org/TR/html5/syntax.html#parsing-main-intable


Btw a really easy way to do it is use search and replace.

Search: <table>

Replace: <table><tbody>

And after:

Search: </table>

Replace: </tbody></table>

Upvotes: 1

Related Questions