nowox
nowox

Reputation: 29178

Strip comments in a C file while reading it line per line

I would like to do some re-factoring on assembly files that are compatible with C/C++ comments and preprocessor directives.

Unfortunately I cannot use refactoring tool such as Astyle. I have to manually parse my file.

My refactoring algorithm iterates on each line of a file as shown below:

while(<FH>)
{
   next if isComment($_);

   $count += s/$search/$replace/;   # A refactoring rule
   $count += arithmetic($_);        # R1=R2+3*4;  --> r1 = r2 + 3 * 4;
   ...
   $out .= $_;
}

if($count)
{
   open my $fh ">$filename";
   print $fh $out;
   close $fh;
}

With this method I cannot accurately detect a comment line. So I implemented counter that count on each /* and decrease on every */. If the counter is bigger than 0, I ignore that line.

Unfortunately this method won't work in this case:

/*     /* <-- Not allowed      */ /*    */ 

The counter will be equal to 1 while it should be equal to 0.

So I am looking to an accurate way to detect comment blocks and ignore them. Is there any package or module that can help me?

Upvotes: 0

Views: 164

Answers (2)

nowox
nowox

Reputation: 29178

Eventually I found this solution which works pretty well. I globally identify all the comments blocks and replace them with markers /*@@n@@*/ where n is a number.

Once the processing is done, I can restore the original comments.

#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;

# C/C++ Comment detection
my $re = qr{(
   /\*         ##  Start of /* ... */ comment
   [^*]*\*+    ##  Non-* followed by 1-or-more *'s
   (?:
     [^/*][^*]*\*+
   )*
   /
   |//.*      ##  // ... comment
   |"[^"]*"   ##  A souble quoted string
   |'[^"]*'   ##  A simple quoted string
   )}mx;

my $i = 0;
my @comments = ();

while(<$fh>) {
    return unless -f;
    my $filename = $_;

    # Read whole file
    open my $fh, '<', $filename or die "Unable to open $filename";
    $_ = do {local $/ = undef; <$fh>};

    # Store C/C++ comments and replace them with markers
    $i = 0;
    @comments = ();
    s|$re|&store($1)|eg;

    # Do the processing
    ...

    # Restore C comments
    $i = 0;
    for my $comment (@comments) {
       my $s = quotemeta("/*@@".$i++."@@*/");
       s|$s|$comment|g;
    }
}

sub store {
    my $marker = "/*@@".$i."@@*/";
    $comments[$i] = shift;
    $i++;
    $marker;
}

Upvotes: 0

urzeit
urzeit

Reputation: 2909

You have to parse the code in more detail, since comment chars might be in a string or in a #ifdef.

Maybe you should run a preprocessor to prepare the code for you. For the GCC-preprocessor, have a look at How do I run the GCC preprocessor to get the code after macros like #define are expanded? .

You may want to output the preprocessed code to stdout and open a pipe in your perl code.

To do it completely right you have to parse all include files, too. Imagine the following (really bad, but valid) code:

inc1.h

/*

inc2.h

*/

main.c

#include <stdio.h>

int main() {
    #include "inc1.h"
    printf("Ha!\n");
    #include "inc2.h"
}

Upvotes: 1

Related Questions