Reputation: 267077
I'm participating in one of the Code Golf competitions where the smaller your file size is, the better.
Rather than manually removing all whitespace, etc., I'm looking for a program or website which will take a file, remove all whitespace (including new lines) and return a compact version of the file. Any ideas?
Upvotes: 2
Views: 4078
Reputation: 27313
You could use:
sed 's/\s\s+/ /g' youfile > yourpackedfile`
There is also this online tool.
You can even do it in PHP (how marvelous is life):
$data = file_get_contents('foobar.php');
$data = preg_replace('/\s\s+/', ' ', $data);
file_put_contents('foobar2.php', $data);
You have to note this won't take care of a string variable like $bar = ' asd aa a';
it might be a problem depending on what you are doing. The online tool seems to handle this properly.
Upvotes: 10
Reputation: 5919
This is a PHP function that will do the work for you:
function compress_php_src($src) {
// Whitespaces left and right from this signs can be ignored
static $IW = array(
T_CONCAT_EQUAL, // .=
T_DOUBLE_ARROW, // =>
T_BOOLEAN_AND, // &&
T_BOOLEAN_OR, // ||
T_IS_EQUAL, // ==
T_IS_NOT_EQUAL, // != or <>
T_IS_SMALLER_OR_EQUAL, // <=
T_IS_GREATER_OR_EQUAL, // >=
T_INC, // ++
T_DEC, // --
T_PLUS_EQUAL, // +=
T_MINUS_EQUAL, // -=
T_MUL_EQUAL, // *=
T_DIV_EQUAL, // /=
T_IS_IDENTICAL, // ===
T_IS_NOT_IDENTICAL, // !==
T_DOUBLE_COLON, // ::
T_PAAMAYIM_NEKUDOTAYIM, // ::
T_OBJECT_OPERATOR, // ->
T_DOLLAR_OPEN_CURLY_BRACES, // ${
T_AND_EQUAL, // &=
T_MOD_EQUAL, // %=
T_XOR_EQUAL, // ^=
T_OR_EQUAL, // |=
T_SL, // <<
T_SR, // >>
T_SL_EQUAL, // <<=
T_SR_EQUAL, // >>=
);
if(is_file($src)) {
if(!$src = file_get_contents($src)) {
return false;
}
}
$tokens = token_get_all($src);
$new = "";
$c = sizeof($tokens);
$iw = false; // Ignore whitespace
$ih = false; // In HEREDOC
$ls = ""; // Last sign
$ot = null; // Open tag
for($i = 0; $i < $c; $i++) {
$token = $tokens[$i];
if(is_array($token)) {
list($tn, $ts) = $token; // tokens: number, string, line
$tname = token_name($tn);
if($tn == T_INLINE_HTML) {
$new .= $ts;
$iw = false;
}
else {
if($tn == T_OPEN_TAG) {
if(strpos($ts, " ") || strpos($ts, "\n") || strpos($ts, "\t") || strpos($ts, "\r")) {
$ts = rtrim($ts);
}
$ts .= " ";
$new .= $ts;
$ot = T_OPEN_TAG;
$iw = true;
} elseif($tn == T_OPEN_TAG_WITH_ECHO) {
$new .= $ts;
$ot = T_OPEN_TAG_WITH_ECHO;
$iw = true;
} elseif($tn == T_CLOSE_TAG) {
if($ot == T_OPEN_TAG_WITH_ECHO) {
$new = rtrim($new, "; ");
} else {
$ts = " ".$ts;
}
$new .= $ts;
$ot = null;
$iw = false;
} elseif(in_array($tn, $IW)) {
$new .= $ts;
$iw = true;
} elseif($tn == T_CONSTANT_ENCAPSED_STRING
|| $tn == T_ENCAPSED_AND_WHITESPACE)
{
if($ts[0] == '"') {
$ts = addcslashes($ts, "\n\t\r");
}
$new .= $ts;
$iw = true;
} elseif($tn == T_WHITESPACE) {
$nt = @$tokens[$i+1];
if(!$iw && (!is_string($nt) || $nt == '$') && !in_array($nt[0], $IW)) {
$new .= " ";
}
$iw = false;
} elseif($tn == T_START_HEREDOC) {
$new .= "<<<S\n";
$iw = false;
$ih = true; // in HEREDOC
} elseif($tn == T_END_HEREDOC) {
$new .= "S;";
$iw = true;
$ih = false; // in HEREDOC
for($j = $i+1; $j < $c; $j++) {
if(is_string($tokens[$j]) && $tokens[$j] == ";") {
$i = $j;
break;
} else if($tokens[$j][0] == T_CLOSE_TAG) {
break;
}
}
} elseif($tn == T_COMMENT || $tn == T_DOC_COMMENT) {
$iw = true;
} else {
if(!$ih) {
$ts = strtolower($ts);
}
$new .= $ts;
$iw = false;
}
}
$ls = "";
}
else {
if(($token != ";" && $token != ":") || $ls != $token) {
$new .= $token;
$ls = $token;
}
$iw = true;
}
}
return $new;
}
// This is an example
$src = file_get_contents('foobar.php');
file_put_contents('foobar3.php',compress_php_src($src));
Upvotes: 1
Reputation: 546065
Run php -w
on it!
php -w myfile.php
Unlike a regular expression, this is smart enough to leave strings alone, and it removes comments too.
Upvotes: 0
Reputation: 150
Notepad++ is quite a nice editor if you are on Windows, and it has a lot of predefined macros, trimming down code and removing whitespace among them.
It can do regular expressions and has a plethora of features to help the code hacker or script kiddie.
Upvotes: 0
Reputation: 382726
If your code editor programs supports regular expressions, you can try this:
Find this: [\r\n]{2,}
Replace with this: \n
Then Replace All
Upvotes: 0
Reputation: 354566
In PowerShell (v2) this can be done with the following little snippet:
(-join(gc my_file))-replace"\s"
or longer:
(-join (Get-Content my_file)) -replace "\s"
It will join all lines together and remove all spaces and tabs.
However, for some languages you probably don't want to do that. In PowerShell for example you don't need semicolons unless you put multiple statements on a single line so code like
while (1) {
"Hello World"
$x++
}
would become
while(1){"HelloWorld"$x++}
when applying aforementioned statements naïvely. It both changed the meaning and the syntactical correctness of the program. Probably not too much to look out for in numerical golfed solutions but the issue with lines joined together still remains, sadly. Just putting a semicolon between each line doesn't actually help either.
Upvotes: 1