Reputation: 111
How can I match and delete all comments from the line? I can delete comments starting from new line, or the ones not in quotes using sed. But my script fails in the following examples
This one "# this is not a comment" # but this "is a comment"
Can sed handle this case? if yes what is the regex?
Example:
Input:
This one "# this is not a comment" # but this "is a comment"
Output:
This one "# this is not a comment"
Upvotes: 3
Views: 247
Reputation: 1603
You can use a lexical analyzer like Flex directly applied to the script. In its manual you can find "How can I match C-style comments?" and I think that you can adapt that part to your problem.
If you need an in-depth tutorial, you can find it here; under "Lexical Analysis" section you can find a pdf that introduce you to the tool and an archive with some practical examples, including "c99-comment-eater", which you can draw inspiration from.
Upvotes: 1
Reputation: 46
If we assume that # is not a comment when it is in quotes or escaped with backslash, then we can define the following regex:
(ES|RT|QT)*C?
where
ES - escape sequence: \ followed by 1 char
\\.
RT - non-special regular text
[^"\\#]*
QT - text in quotes
"[^"]*"
C - comment starting with unescaped, unquoted hash sign # and ending with the end of line
#.*
The possible solution using sed:
sed 's/^\(\(\\.\|[^"\\#]*\|"[^"]*"\)*\)#.*$/\1/'
Upvotes: 1