Mert Aşan
Mert Aşan

Reputation: 436

How to strip comments from Php/Html source? (with sed/awk/grep etc..)

I would like to remove comments from the source code of PHP files using Shell Script or AppleScript. (I'm using Codekit OS X App hooks)

I tried to do it with sed or grep commands, but the modern regex codes I'm used to with PHP / javascript don't work.

https://regex101.com/r/awpFe0/1/

Demo it will be enough to tell me what I want to do.

I need a working version on the command line.

I found this, but I can't get exactly the result I want: https://stackoverflow.com/a/13062682/6320082

I've been working it for two days. Please help me.

Example contents:

Test string // test comment
Test string// test comment
echo 1;//test comment
http://domain.ltd // test comment
$wrong_slashes = "http://domain.ltd/asd//asd//asd";
function test() { // test comment
Test string /* test comment // test comment */
Test string //test /* test */
Test string /* test comment /* */ */
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
/**
multi-line comments
*/
<script src="//cdn.com/site.js">
$pattern = "([//])";
$pattern = '([//])';
<!-- html comments -->
<div>test</div> <!-- html comments -->
<!-- html comments --> <div>test</div>

Upvotes: 0

Views: 455

Answers (2)

Mert Aşan
Mert Aşan

Reputation: 436

I've solved it by running PHP CLI with shell script. I did what I wanted using the example in the url with PHP.

example function

Upvotes: 0

tshiono
tshiono

Reputation: 22012

If Perl is your option, try something like:

perl -e '
    while (<>) {
        $str .= $_; # slurps all lines into a string for multi-line matching
    }

    1 while $str =~ s#/\*(?!.*/\*).*?\*/##sg;
                    # removes C-style /* comments */ recursively
    $str =~ s#(?<!:)//[^\x27"]*?$##mg;
                    # removes // comments not preceded by ":" and not within quotes
    $str =~ s#<!--.*?-->##sg;
                    # removes <!-- html comments -->
    print $str;
' example.txt

which outputs:

Test string
Test string
echo 1;
http://domain.ltd
$wrong_slashes = "http://domain.ltd/asd//asd//asd";
function test() {
Test string
Test string
Test string
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

<script src="//cdn.com/site.js">
$pattern = "([//])";
$pattern = '([//])';

<div>test</div>
 <div>test</div>

Let me excuse that the script above might be optimized to the given sample and it will be easy to find bizarre codes with which my answer doesn't work. It may be close to impossible to write a flawless regex to match comments without the designated parser of the language.

Upvotes: 1

Related Questions