VeroLom
VeroLom

Reputation: 3934

HTML indenting with Perl

I have a variable (ex. $content) with HTML code (without line breaks - removed before). How to process HTML code with adding TAB indent after each open tag and decrease indent level after each closing tag?

P.S. I don't need external script or programm (like tidy). I need to make this in my own script.

For example: source content:

<!DOCTYPE html><html><head><title>test</title></head>   <body>  <h1>hello!</h1><p>It works!</p></body></html>

needed result:

<!DOCTYPE html>
<html>
    <head>
        <title>test</title>
    </head>
    <body>
        <h1>hello!</h1>
        <p>It works!</p>
    </body>
</html>

Upvotes: 3

Views: 1643

Answers (4)

Ashley
Ashley

Reputation: 4335

You could also try Marpa::R2::HTML referring to the source of its companion/demo utility html_fmt to see how to target specific parts of the document for manipulation. I haven't used it and can't try today for want of 5.10 but it looks like it could be a good match.

Upvotes: 1

reinierpost
reinierpost

Reputation: 8611

An option I've used is CGI::Pretty.

Upvotes: 1

daxim
daxim

Reputation: 39158

use HTML::HTML5::Parser qw();
use HTML::HTML5::Writer qw();
use XML::LibXML::PrettyPrint qw();

print HTML::HTML5::Writer->new(
    start_tags => 'force',
    end_tags => 'force',
)->document(
    XML::LibXML::PrettyPrint->new_for_html(
        indent_string => "\t"
    )->pretty_print(
        HTML::HTML5::Parser->new->parse_string(
            '<!DOCTYPE html><html><head><title>test</title></head>   <body>  <h1>hello!</h1><p>It works!</p></body></html>'
        )
    )
);

<!DOCTYPE html><html>
    <head>
        <title>test</title>
    </head>
    <body>
        <h1>hello!</h1>
        <p>It works!</p>
    </body>
</html>

Upvotes: 14

Dave Cross
Dave Cross

Reputation: 69314

The manual page says that tidy won't produce output that contains tabs. But it's simple enough to post-process the output to deal with that.

$ tidy -indent foo.html | perl -pe 's|^( +)|"\t" x ((length $1) / 2)|e;'

Using an existing tool has to be a far better solution than inventing it yourself. But if you insist then you should, at least, use a pre-written parser like Perl's HTML::Parser.

I should also point out that your specification of the problem seems to be incorrect. You say you want to add a tab after each opening tag. But your sample output doesn't do that for the <title>, <h1> or &p> tags.

Upvotes: 1

Related Questions