Reputation: 5384
So, I have this as input file, temp.html
:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div id="ext-comp-1725" class="x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window" style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl"><div class="x-window-tr"><div class="x-window-tc"><div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536"><div class="x-tool x-tool-icon x-tool-close"> </div></div>
<span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div></div></div></div>
</body></html>
I was hoping I could pretty-print and indent tags hierarchically by using xmlstarlet
:
$ xmlstarlet fo --html --recover --indent-spaces 2 --omit-decl temp.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<div id="ext-comp-1725" class="x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window" style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl"><div class="x-window-tr"><div class="x-window-tc"><div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536"><div class="x-tool x-tool-icon x-tool-close"> </div></div>
<span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div></div></div></div>
</div></body>
</html>
... however, as it is obvious from the command output above, it only indents some tags (e.g. it split <html><body>
and indented those tags properly) - but fails on others (e.g. it kept </div></div></div></div>
in a single line).
Is it possible to persuade/set-up xmlstarlet
to split off and indent all tags, one tag per line, with proper indentation?
$ xmlstarlet --version
srcinfo-cache
compiled against libxml2 2.9.10, linked with 21209
compiled against libxslt 1.1.34, linked with 10142
Upvotes: 0
Views: 38
Reputation: 1801
First convert the input file to XML (a </div>
is missing).
By default format
uses an indentation of 2 spaces.
xmlstarlet -q format --html --recover --omit-decl temp.html |
xmlstarlet format --omit-decl
Upvotes: 1
Reputation: 5384
Well, it seems tidy
works here (found it via A command-line HTML pretty-printer: Making messy HTML readable):
$ tidy --version
HTML Tidy for Windows version 5.8.0
$ tidy -indent -wrap 160 -ashtml -utf8 temp.html
line 3 column 1 - Warning: missing </div>
line 2 column 7 - Warning: inserting missing 'title' element
Info: Doctype given is "-//W3C//DTD HTML 4.0 Transitional//EN"
Info: Document content looks like HTML 4.01 Strict
Tidy found 2 warnings and 0 errors!
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta name="generator" content="HTML Tidy for HTML5 for Windows version 5.8.0">
<title></title>
</head>
<body>
<div id="ext-comp-1725" class=
"x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window"
style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl">
<div class="x-window-tr">
<div class="x-window-tc">
<div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536">
<div class="x-tool x-tool-icon x-tool-close">
</div>
</div><span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
About HTML Tidy: https://github.com/htacg/tidy-html5
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
Latest HTML specification: http://dev.w3.org/html5/spec-author-view/
Validate your HTML documents: http://validator.w3.org/nu/
Lobby your company to join the W3C: http://www.w3.org/Consortium
Do you speak a language other than English, or a different variant of
English? Consider helping us to localize HTML Tidy. For details please see
https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md
Upvotes: 1