Minification: Using regex to remove linebreaks from JavaScript code

Question

For the pure purpose of obfuscation, the first three lines seem to clean up the script pretty nicely from unnecessary enters.

Can anyone tell me what the lines 1 - 4 actually do? Only thing I know from trial and error is that if I comment out the fourth line the site works, if I leave it in place the site breaks.


Is there a better way to remove one or multiple line enters from JavaScript?

Amal Murali · Accepted Answer

Dissection of all four regular expressions

Let's try and dissect each one of the regular expressions.

First regex

$buffer = preg_replace('/([;])\s+/', '$1', $buffer);

Explanation

(      # beginning of the first capturing group
 [;]   # match the literal character ';'
)      # ending of the first capturing group
\s+    # one or more whitespace characters (including newlines)

The above regular expression removes any whitespace that occurs immediately following a semicolon. ([;]) is a capturing group, meaning if a match is found, it is stored into a backreference, so we could use it later. For example, if our string was foo; , then the expression would match ; and the whitespace characters. The replacement pattern here is $1, which means the entire matched string would be replaced with just a semicolon.

Second regex

$buffer = preg_replace('/([}])\s+(else)/', '$1else', $buffer);

Explanation

(      # beginning of the first capturing group
 [}]   # match the literal character ';'
)      # ending of the first capturing group
\s+    # one or more whitespace characters
(else) # match and capture 'else'

The above regex removes any whitespace between a closing curly brace (}) and else. The replacement pattern here is $1else, which means, the string with whitespace will get replaced by what was captured by the first capturing group ([}]) (which is just the semicolon) followed by the keyword else. Nothing much to it.

Third regex

$buffer = preg_replace('/([}])\s+(var)/', '$1;var', $buffer);

Explanation

(      # beginning of the first capturing group
 [}]   # match the literal character ';'
)      # ending of the first capturing group
\s+    # one or more whitespace characters
(var)  # match and capture 'var'

This is the same as previous regex. The only difference here is the keyword - var instead of else. The semicolon character is optional in JavaScript. But if you want to write multiple statements in a single line, there's no way for the interpreter to know they're multiple lines, so a ; will need to be used to terminate each statement.

Fourth regex

$buffer = preg_replace('/([{};])\s+(\$)/', '$1\$', $buffer);

Explanation

(      # beginning of the first capturing group
 [{};] # match the literal character '{' or '}' or ';'
)      # ending of the first capturing group
\s+    # one or more whitespace characters
(      # beginning of the second capturing group
 \$    # match the literal character '$'
)      # ending of the second capturing group

The replacement pattern here is $1\$ , which means the entire matched string would be replaced with what was matched by the first capturing group ([{};]) followed by a literal $ character.

Sidenote

This answer was only meant to explain the four regexes and what it does. The expressions could be improved a lot, but I'm not going into that as it's not the correct approach. As Qtax points out in the comments, you really should use a proper JS minifier to achieve this task. You might want to check out Google's Closure Compiler - it looks pretty neat.

If you're still confused how it works, don't worry. Learning regexes can be difficult in the beginning. I suggest you use this website - http://regularexpressions.info. It is a pretty decent resource for learning regular expressions. If you're looking for a book, you might want to check out Mastering Regular Expressions By Jeffrey Friedl.

Minification: Using regex to remove linebreaks from JavaScript code

Answers (1)

Dissection of all four regular expressions

Sidenote

Related Questions