Teifion
Teifion

Reputation: 111109

What should I consider before minifying HTML?

I've googled around but can't find any HTML minification scripts.

It occurred to me that maybe there is nothing more to HTML minification than removing all unneeded whitespace.

Am I missing something or has my Google Fu been lost?

Upvotes: 13

Views: 8635

Answers (12)

gensn
gensn

Reputation: 45

If you have installed node.js and you are a windows user you can create this .bat It will minify all html in your folder in the min subfolder.

The output will be in min folder

  1. open the console. run--> npm install html-minifier -g
  2. create the .bat. don't forget to change the route in cd command. It's easier to change the folder in the bat file than copy and paste.
  3. go in console into the .bat folder and run it.

cd the_destination_folder

dir  /b *.HTML > list1.txt

for /f "tokens=*" %%A in (list1.txt) do html-minifier --collapse-whitespace --remove-comments --remove-optional-tags %%~nxA  -o min\%%~nxA 

pause

Upvotes: 1

B.F.
B.F.

Reputation: 477

Here is a minifier for HTML5 written in PHP.

<?PHP
$in=file_get_contents('path/to/source.html');

//Strips spaces if there are more than one.
$in=preg_replace('/\s{2,}/m',' ',$in);
//trim
$in=preg_replace('/^\s+|\s+$/m','',$in);
/*Strips spaces between tags. 
Use (&nbsp; or &shy; or better) padding or margin if necessary, otherwise the html
parser appends a one space textnode.*/  
$in=preg_replace('/ ?> < ?/','><',$in);
//Removes tag end slash.
$in=preg_replace('@ ?/>@','>',$in);
//Removes HTML comments except conditional IE comments.
$in=preg_replace('/<!--[^\[]*?-->/','',$in);
//Removes quotes where possible.
$in=preg_replace('/="([^ \'"\=><]+)"/','=$1',$in);
$in=preg_replace("/='([^ '\"\=><]+)'/",'=$1',$in);

file_put_contents('path/to/min.html',$in);
?>

After that you have a one line, shorter html code.

Better you make an array from the regular expressions, but aware to escape the back slashes.

Upvotes: 0

Martin Tournoij
Martin Tournoij

Reputation: 27852

I've used this regexp for years, without any problems: s/>\s*</></g

In Python re.sub(r'>\s*<', '><', html)

Or in PHP preg_replace('/>\s*</', '><', $html);

This removed all whitespace between tags, but not anywhere, this is fairly safe (but not perfect, there are situations where this will break, but they're rare).

My main reason for doing this isn't speed/file size, but because the whitespace often introduces a, well, space. This would be okay, but when you start mucking about in your DOM with Javascript, spaces are often lost, creating (minor) layout differences.

Consider:

<div>
    <a>link1</a>
    <a>link2</a>
</div>

There's a space between the links, but now I do something like:

$('div').append('<a>link3</a>')

And there's no space ... I need to manually add the space in my JS, which is fairly ugly & error-prone IMHO.

Upvotes: 1

Basic Bridge
Basic Bridge

Reputation: 1911

I recently found a PHP based script that minify your sites HTML - Inline css - Inline javascript on the fly it is called as Dynamic website compressor

Upvotes: 0

Paul D. Waite
Paul D. Waite

Reputation: 98936

I haven’t tried it yet, but htmlcompressor is an HTML minifier, if you fancy giving one a try.

Upvotes: 1

geowa4
geowa4

Reputation: 41853

You can find some good references here to things like HTML tidy and others.

If you don't want to use one of those options, Prototype has a means to clean the whitespace in the DOM. You could do that on your own and copy it via 'View Generated Source' in the Firefox extension Web Developer Toolbar. Then you can replace the original html with prototype's fix. Sorry for not making that apparent nickf.

(I recommend the first link)

Upvotes: 1

Robinb
Robinb

Reputation: 384

Couldn't JavaScript be used as a decompresser for a compressed HTML string, for instance have a DEV build for the uncompressed format, run a 'publish' script to compress the DEV build to production and attach a JavaScript to the HTML source (with the whitespace and such removed as before)?

The bandwidth would be reduced on the server, but the downside is there is a lot more client strain for decompressing the string to HTML. Also JavaScript would need to be enabled and be able to parse the decompressed string to HTML.

I am not saying its a definite solution, but something that might work - it all depends on if your looking in regards to bandwidth without the users JavaScript permissions/systems spec, or such.

Otherwise look for obfuscation scripts, a simple google search produced http://tinyurl.com/phpob - dependent on what your looking for there should be a software package available.

If I am on the wrong lines, please shout and I will see what else I can do.

Good Luck!

Upvotes: 0

Mr. Shiny and New 安宇
Mr. Shiny and New 安宇

Reputation: 13898

You have to be careful when removing stuff from HTML as it's a fragile language. Depending on how your pages are coded some of that whitespace might be more significant; also if you have CSS styles such as white-space: pre then you may need to keep the whitespace. Plus there are numerous browser bugs, etc, and basically every character in an HTML file might be there to satisfy some requirement or appease some browser.

In my opinion your best bet is to design the pages well with CSS techniques (I was recently able to take an important page on the site I work for and reduce it's size by 50% just by recoding it using CSS instead of tables and nested style="..." attributes). Then, use GZip to reduce the size of your pages for browsers that understand gzip. This will save bandwidth while preserving the structure of the html.

Upvotes: 28

ChrisW
ChrisW

Reputation: 56123

Sometimes, depending on the enclosing tags and/or on the CSS, whitespace may be significant.

Upvotes: 7

Jon Adams
Jon Adams

Reputation: 25147

Outside of HTML Tidy/removing white space as the other answers mentioned, there isn't much.

This is more of a manual task pulling out style attributes into CSS (hopefully you're not using FONT tags, etc.), using fewer tags and attributes where possible (like not embedding <strong> tags in an element but using CSS to make the whole element font-weight: bold, unless of course it makes semantic sense to use >strong<), etc.

Upvotes: 2

cowgod
cowgod

Reputation: 8676

There's a pretty lengthy discussion on this Wordpress blog about this topic. You can find a very lengthy proposed solution using PHP and HTML Tidy there.

Upvotes: 1

Tamas Czinege
Tamas Czinege

Reputation: 121444

Yes I guess it's pretty much removing whitespace and comments. You cannot replace identifiers with shorter ones like in javascript, since chances are that CSS classes or javascript will depend on those identifiers.

Also, you should be careful when removing whitespace and make sure that there is always at least whitespace character left, otherwise allyourtextwilllooklikethis.

Upvotes: 1

Related Questions