Reputation: 78914
For a small DSL I'm writing I'm looking for a regex to match a comment string at the end of the like the //
syntax of C++.
The simple case:
someVariable = 12345; // assignment
Is trivial to match but the problem starts when I have a string in the same line:
someFunctionCall("Hello // world"); // call with a string
The //
in the string should not match as a comment
EDIT - The thing that compiles the DSL is not mine. It's a black box as far as I'm which I don't want to change and it doesn't support comments. I just want to add a thin wrapper to make it support comments.
Upvotes: 1
Views: 2265
Reputation: 170148
shoosh wrote:
EDIT - The thing that compiles the DSL is not mine. It's a black box as far as I'm which I don't want to change and it doesn't support comments. I just want to add a thin wrapper to make it support comments.
In that case, create a very simple lexer that matches one of three tokens:
// ...
comments" ... "
Now, while you iterate ov er these 3 different type of tokens, simply print tokens (2) and (3) to the stdout (or to a file) to get the uncommented version of your source file.
A demo with GNU Flex:
example input file, in.txt:
someVariable = 12345; // assignment
// only a comment
someFunctionCall("Hello // world"); // call with a string
someOtherFunctionCall("Hello // \" world"); // call with a string and
// an escaped quote
The lexer grammar file, demo.l:
%%
"//"[^\r\n]* { /* skip comments */ }
"\""([^"]|[\\].)*"\"" {printf("%s", yytext);}
. {printf("%s", yytext);}
%%
int main(int argc, char **argv)
{
while(yylex() != 0);
return 0;
}
And to run the demo, do:
flex demo.l
cc lex.yy.c -lfl
./a.out < in.txt
which will print the following to the console:
someVariable = 12345;
someFunctionCall("Hello // world");
someOtherFunctionCall("Hello // \" world");
I'm not really familiar with C/C++, and just saw @sehe's recommendation of using a pre-processor. That looks to be a far better option than creating your own (small) lexer. But I think I'll leave this answer since it shows how to handle this kind of stuff if no pre-processor is available (for whatever reason: perhaps cpp
doesn't recognise certain parts of the DSL?).
Upvotes: 2
Reputation: 392931
EDIT
Since you are effectively preprocessing a sourcefile, why don't you use an existing preprocessor? If the language is sufficiently similar to C/C++ (especially regarding quoting and string literals), you will be able to just use cpp -P
:
echo 'int main() { char* sz="Hello//world"; /*profit*/ } // comment' | cpp -P
Output: int main() { char* sz="Hello//world"; }
Other ideas:
Use a proper lexer/parser instead
Have a look at
All suites come with sample grammars that parse C, C++ or a subset thereof
Upvotes: 2