Reputation: 3934
I have a variable (ex. $content) with HTML code (without line breaks - removed before). How to process HTML code with adding TAB indent after each open tag and decrease indent level after each closing tag?
P.S. I don't need external script or programm (like tidy). I need to make this in my own script.
For example: source content:
<!DOCTYPE html><html><head><title>test</title></head> <body> <h1>hello!</h1><p>It works!</p></body></html>
needed result:
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<h1>hello!</h1>
<p>It works!</p>
</body>
</html>
Upvotes: 3
Views: 1643
Reputation: 4335
You could also try Marpa::R2::HTML referring to the source of its companion/demo utility html_fmt to see how to target specific parts of the document for manipulation. I haven't used it and can't try today for want of 5.10 but it looks like it could be a good match.
Upvotes: 1
Reputation: 39158
use HTML::HTML5::Parser qw();
use HTML::HTML5::Writer qw();
use XML::LibXML::PrettyPrint qw();
print HTML::HTML5::Writer->new(
start_tags => 'force',
end_tags => 'force',
)->document(
XML::LibXML::PrettyPrint->new_for_html(
indent_string => "\t"
)->pretty_print(
HTML::HTML5::Parser->new->parse_string(
'<!DOCTYPE html><html><head><title>test</title></head> <body> <h1>hello!</h1><p>It works!</p></body></html>'
)
)
);
<!DOCTYPE html><html>
<head>
<title>test</title>
</head>
<body>
<h1>hello!</h1>
<p>It works!</p>
</body>
</html>
Upvotes: 14
Reputation: 69314
The manual page says that tidy
won't produce output that contains tabs. But it's simple enough to post-process the output to deal with that.
$ tidy -indent foo.html | perl -pe 's|^( +)|"\t" x ((length $1) / 2)|e;'
Using an existing tool has to be a far better solution than inventing it yourself. But if you insist then you should, at least, use a pre-written parser like Perl's HTML::Parser.
I should also point out that your specification of the problem seems to be incorrect. You say you want to add a tab after each opening tag. But your sample output doesn't do that for the <title>, <h1> or &p> tags.
Upvotes: 1