Ali
Ali

Reputation: 267077

Easiest way to remove all whitespace from a code file?

I'm participating in one of the Code Golf competitions where the smaller your file size is, the better.

Rather than manually removing all whitespace, etc., I'm looking for a program or website which will take a file, remove all whitespace (including new lines) and return a compact version of the file. Any ideas?

Upvotes: 2

Views: 4078

Answers (7)

RageZ
RageZ

Reputation: 27313

You could use:

sed 's/\s\s+/ /g' youfile > yourpackedfile`

There is also this online tool.

You can even do it in PHP (how marvelous is life):

$data = file_get_contents('foobar.php');
$data = preg_replace('/\s\s+/', ' ', $data);
file_put_contents('foobar2.php', $data);

You have to note this won't take care of a string variable like $bar = ' asd aa a'; it might be a problem depending on what you are doing. The online tool seems to handle this properly.

Upvotes: 10

pollux1er
pollux1er

Reputation: 5919

This is a PHP function that will do the work for you:

function compress_php_src($src) {

    // Whitespaces left and right from this signs can be ignored
    static $IW = array(
        T_CONCAT_EQUAL,             // .=
        T_DOUBLE_ARROW,             // =>
        T_BOOLEAN_AND,              // &&
        T_BOOLEAN_OR,               // ||
        T_IS_EQUAL,                 // ==
        T_IS_NOT_EQUAL,             // != or <>
        T_IS_SMALLER_OR_EQUAL,      // <=
        T_IS_GREATER_OR_EQUAL,      // >=
        T_INC,                      // ++
        T_DEC,                      // --
        T_PLUS_EQUAL,               // +=
        T_MINUS_EQUAL,              // -=
        T_MUL_EQUAL,                // *=
        T_DIV_EQUAL,                // /=
        T_IS_IDENTICAL,             // ===
        T_IS_NOT_IDENTICAL,         // !==
        T_DOUBLE_COLON,             // ::
        T_PAAMAYIM_NEKUDOTAYIM,     // ::
        T_OBJECT_OPERATOR,          // ->
        T_DOLLAR_OPEN_CURLY_BRACES, // ${
        T_AND_EQUAL,                // &=
        T_MOD_EQUAL,                // %=
        T_XOR_EQUAL,                // ^=
        T_OR_EQUAL,                 // |=
        T_SL,                       // <<
        T_SR,                       // >>
        T_SL_EQUAL,                 // <<=
        T_SR_EQUAL,                 // >>=
    );

    if(is_file($src)) {
        if(!$src = file_get_contents($src)) {
            return false;
        }
    }

    $tokens = token_get_all($src);

    $new = "";
    $c = sizeof($tokens);
    $iw = false; // Ignore whitespace
    $ih = false; // In HEREDOC
    $ls = "";    // Last sign
    $ot = null;  // Open tag
    for($i = 0; $i < $c; $i++) {
        $token = $tokens[$i];

        if(is_array($token)) {
            list($tn, $ts) = $token; // tokens: number, string, line
            $tname = token_name($tn);

            if($tn == T_INLINE_HTML) {
                $new .= $ts;
                $iw = false;
            }
            else {
                if($tn == T_OPEN_TAG) {

                    if(strpos($ts, " ") || strpos($ts, "\n") || strpos($ts, "\t") || strpos($ts, "\r")) {
                        $ts = rtrim($ts);
                    }

                    $ts .= " ";
                    $new .= $ts;
                    $ot = T_OPEN_TAG;
                    $iw = true;

                } elseif($tn == T_OPEN_TAG_WITH_ECHO) {

                    $new .= $ts;
                    $ot = T_OPEN_TAG_WITH_ECHO;
                    $iw = true;

                } elseif($tn == T_CLOSE_TAG) {

                    if($ot == T_OPEN_TAG_WITH_ECHO) {
                        $new = rtrim($new, "; ");
                    } else {
                        $ts = " ".$ts;
                    }
                    $new .= $ts;
                    $ot = null;
                    $iw = false;

                } elseif(in_array($tn, $IW)) {

                    $new .= $ts;
                    $iw = true;

                } elseif($tn == T_CONSTANT_ENCAPSED_STRING
                       || $tn == T_ENCAPSED_AND_WHITESPACE)
                {

                    if($ts[0] == '"') {
                        $ts = addcslashes($ts, "\n\t\r");
                    }
                    $new .= $ts;
                    $iw = true;

                } elseif($tn == T_WHITESPACE) {

                    $nt = @$tokens[$i+1];
                    if(!$iw && (!is_string($nt) || $nt == '$') && !in_array($nt[0], $IW)) {
                        $new .= " ";
                    }
                    $iw = false;

                } elseif($tn == T_START_HEREDOC) {

                    $new .= "<<<S\n";
                    $iw = false;
                    $ih = true; // in HEREDOC

                } elseif($tn == T_END_HEREDOC) {

                    $new .= "S;";
                    $iw = true;
                    $ih = false; // in HEREDOC
                    for($j = $i+1; $j < $c; $j++) {
                        if(is_string($tokens[$j]) && $tokens[$j] == ";") {
                            $i = $j;
                            break;
                        } else if($tokens[$j][0] == T_CLOSE_TAG) {
                            break;
                        }
                    }

                } elseif($tn == T_COMMENT || $tn == T_DOC_COMMENT) {

                    $iw = true;

                } else {

                    if(!$ih) {
                        $ts = strtolower($ts);
                    }
                    $new .= $ts;
                    $iw = false;
                }
            }
            $ls = "";

        }
        else {
            if(($token != ";" && $token != ":") || $ls != $token) {
                $new .= $token;
                $ls = $token;
            }
            $iw = true;
        }
    }
    return $new;
}
// This is an example
$src = file_get_contents('foobar.php');
file_put_contents('foobar3.php',compress_php_src($src));

Upvotes: 1

nickf
nickf

Reputation: 546065

Run php -w on it!

php -w myfile.php

Unlike a regular expression, this is smart enough to leave strings alone, and it removes comments too.

Upvotes: 0

myk_raniu
myk_raniu

Reputation: 150

Notepad++ is quite a nice editor if you are on Windows, and it has a lot of predefined macros, trimming down code and removing whitespace among them.

It can do regular expressions and has a plethora of features to help the code hacker or script kiddie.

Notepad++ website

Upvotes: 0

Sarfraz
Sarfraz

Reputation: 382726

If your code editor programs supports regular expressions, you can try this:

Find this: [\r\n]{2,}
Replace with this: \n
Then Replace All

Upvotes: 0

Useless
Useless

Reputation: 67743

$ tr -d ' \n' <oldfile >newfile

Upvotes: 2

Joey
Joey

Reputation: 354566

In PowerShell (v2) this can be done with the following little snippet:

(-join(gc my_file))-replace"\s"

or longer:

(-join (Get-Content my_file)) -replace "\s"

It will join all lines together and remove all spaces and tabs.

However, for some languages you probably don't want to do that. In PowerShell for example you don't need semicolons unless you put multiple statements on a single line so code like

while (1) {
    "Hello World"
    $x++
}

would become

while(1){"HelloWorld"$x++}

when applying aforementioned statements naïvely. It both changed the meaning and the syntactical correctness of the program. Probably not too much to look out for in numerical golfed solutions but the issue with lines joined together still remains, sadly. Just putting a semicolon between each line doesn't actually help either.

Upvotes: 1

Related Questions