coding
coding

Reputation: 9

Using sed to show C style comment and C++ comment

so far I did this but it does not print what I want it to. Thank you in advance.

$  sed -n -e "/\(*\)/g" c_comments | sed -n '/\/\*/p; /^ \*/p' c_comments |sed -n '/[[:blank:]]/p' c_comments 

This is the text file c_comments, and I want to extract the c_style comment and C++ comments. // File with various examples of C and C++ style comments

/* simple C style comment on one line with no code  */

x = 5*3;   /* Example comment following code   */

   /* comments do not have to begin at the line beginning  */
/*  And you can have empty comments like the next one */
/**/
/*  comments with code following the comment  (not possible with C++ style) 
*/  x = w * u/y/z;
  // As shown below you can have what appear to comments embedded in a 
 string
    // The line below should be counted as code
printf(" This output string looks like a /* comment */ doesn't it?\n"); 
/* ---- Example of a multiline 
C style comment  */       
c++;  //  C ++ style comment following code
c=a/b*c;   /*  comment between two pieces of code */   w = a*b/e;
 /*  This is a multiline c style comment.  This
comment covers several
lines.
 ------*/
a = b / c * d;
/* -----------End of the file ---------------*/ 

Upvotes: 1

Views: 1518

Answers (2)

Michael Back
Michael Back

Reputation: 1871

The following will process your particular file.

#! /bin/sed -f

# using ':' for regex delimiter, as we are matching a lot of
# literal '/' in the following expressions

# remove quoted strings
# TODO: allow quotes in comments and detect multiline quoted strings
s:["][^"]*["]::g

# detect '/* ... */'
\:/\*.*\*/: {
    # handle leading '//'
    \://.*/\*: {
        s:.*//://:
        p;d
    }
    s:.*/\*:/*:
    s:\*/.*:*/:
    p;d
}

# detect '/* ... \n ... */'
# TODO: fix the '// ... /*' case
\:/\*:,\:\*/: {
    s:.*/\*:/*:
    s:\*/.*:*/:
    p;d
}

# detect //
\://: {
    s:.*//://:
    p;d
}
d

The above is more of a non-example than an example -- to show some of the things that are really difficult to do in sed (pay special attention to the TODO's).

So, in general, extracting C comments using a single sed script is IMO not such a good fit -- getting it to be altogether correct would be very hard and the result would very quickly devolve into some very obtuse code.

Here is an alternative that decorates our C Source with sed, uses awk to filter it (taking into account the multi-level C syntax rules), and then removes decoration again with sed:

c_decorate

#! /bin/sed -f
s:\r::g
s:/\*:\rC/*:g
s:\*/:*/\rE:g
s://:\rL//:g
s:":\rQ":g

c_filter

#! /usr/bin/awk -f
BEGIN {
    RS = ORS = "\r"
    lc=0 # State variable for continuing a C++ style Line comment
    cc=0 # State variable for continuing a C style comment
    qt=0 # quote-count
}

NR == 1 { print ""; next }

/^C/ { # Begin C-Style Comment
    if (qt % 2)
        next
    if (lc) {
        if ($0 ~ /\n/) {
            lc = 0
            sub(/\n.*/, "\n")
        }
    } else {
        cc = 1
    }
    print
    next
}
/^E/ { # End C-Style Comment
    if (qt % 2)
        next
    if (lc) {
        if ($0 ~ /\n/) {
            lc = 0
            sub(/\n.*/, "\n")
        }
        print 
    } else if (cc) {
        cc = 0
        if ($0 ~ /\n/)
            print "\n"
        else
            print "E"
    }
    next    
}
/^L/ { # Begin C++ Style Line Comment
    if (qt % 2)
        next
    if (!cc) {
        lc = 0
        if ($0 ~ /\n/)
            sub(/\n.*/, "\n")
        else
            lc = 1
    }
    print
    next
}
/^Q/ { # Quote
    if (lc || cc)
        print
    else
        qt++
    next
}

c_cleanup

#! /bin/sed -f
$ {
    /^$/ d
}
s:\r[CELQ]\?::g

And to call:

$ c_decorate c_comments | c_filter | c_cleanup

Awk fits more naturally then sed for filtering because it natively supports the change of record separator, and it is much easier to specify and reason about arbitrary logic relationships.

To get rid of comment tags, here is an alternative version of c_decorate:

#! /bin/sed -f
s:\r::g
s:/\*:\rC:g
s:\*/:\rE:g
s://:\rL:g
s:":\rQ":g

Update 9/2019 (@Russ) This does not appear to process "quotes" in comments very well, or C-style comments embedded in C++ one line comments, as in

//* this is not handled well.  
/* nor "is" this. */

Therefore, I used this for c_decorate:

#! /bin/sed -f
s:\r::g
# trouble is //* first matches //, then matches /*
s:[^/]/\*:\rC/*:g
s:^/\*:\rC/*:g
s:\*/:*/\rE:g
s://\+:\rL&:g
# does not handle quotes w/in comments
# s:":\rQ":g

Upvotes: 3

coding
coding

Reputation: 9

sed -n -e '/^[[:space:]]*\/\//p' -e ' /^[[:space:]]*\/\*.*\*\/[[:space:]]*$/p' -e '/^[[:space:]]*$/p' c_comments

Upvotes: -1

Related Questions