Reputation: 1382
According to the author of htmlcompressor.com this can not be done as they have semantic meaning.
Here is the particular example:
<meta name='description' content='Foo lets you save and share all your
web bookmarks / favorites in one place. It is free with no advertising for life, and
has straight forward privacy controls.'>
removing the return characters you have:
<meta name='description' content='Foo lets you save and share all your web bookmarks / favorites in one place. It is free with no advertising for life, and has straight forward privacy controls.'>
which is a single line which is what I want to send to the browser.
I want to do this for all my HTML using some string manipulation. Is this possible to do or are there other cases where a return character has meaning? Is there a way to differentiate?
Upvotes: 3
Views: 348
Reputation: 53
Whenever the specification is correct about content attribute being CDATA, a webmaster may use the value of any attribute such as "content" of the "meta" tag in the given example via JavaScript, and compressing the value of the attribute would alter the expected result.
So the author of htmlcompressor.com is correct in that they have a semantic meaning for the purpose of compression.
<meta id="m1" name="item1" content="Sample stuff:
1. This text is multiline on purpose.
2. And the author expects it to remain this way after compression.
So yes, it does matter...">
The same meta tag compressed:
<meta id="m2" name="item2" content="Sample stuff: 1. This text is multiline on purpose. 2. And the author expects it to remain this way after compression. So yes, it does matter...">
And to show the difference:
<script>
alert('"'
+ document.getElementById('m1').content
+ '"\n\n---------------\n\n"'
+ document.getElementById('m2').content + '"'
);
</script>
Afaik, the goal of that site is to compress documents without altering the resulting layout or functionality.
Live example: http://jsfiddle.net/7Qb74/
Upvotes: 0
Reputation: 155075
According to the HTML4.01 specification ( http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2 ), the content=""
attribute of the <meta />
element is CDATA
, which means that whitespace is not significant:
CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:
- Replace character entities with characters,
- Ignore line feeds,
- Replace each carriage return or tab with a single space.
- User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
So it looks like the author of htmlcompression is wrong.
Anyway, despite dire warnings to the contrary, you can probably get-away with using a regular expression to fix this.
I've forgotten the syntax to combine "match only this group, and replace in this sub-region" in regex, but this hack works:
This simple regex will capture the content of the content=""
attribute:
<meta.+content='(.*)'>
Once you've got the content, you can do a straightforward '\r', '\n', ' ' -> ' '
replacement.
Upvotes: 2