Reputation: 949
I'm experimenting to learn flex and would like to match string literals. My code currently looks like:
"\""([^\n\"\\]*(\\[.\n])*)*"\"" {/*matches string-literal*/;}
I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.
I am probably just writing a poor regular expression or one incompatible with flex. Please advise!
Upvotes: 61
Views: 118271
Reputation: 45324
A string consists of a quote mark
"
followed by zero or more of either an escaped anything
\\.
or a non-quote character, non-backslash character
[^"\\]
and finally a terminating quote
"
Put it all together, and you've got
\"(\\.|[^"\\])*\"
The delimiting quotes are escaped because they are Flex meta-characters.
Upvotes: 131
Reputation: 3253
Paste my code snippet about handling string in flex, hope inspire your thinking.
Use Start Condition to handle string literal will be more scalable and clear.
%x SINGLE_STRING
%%
\" BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
\n yyerror("the string misses \" to termiate before newline");
<<EOF>> yyerror("the string misses \" to terminate before EOF");
([^\\\"]|\\.)* {/* do your work like save in here */}
\" BEGIN(INITIAL);
. ;
}
Upvotes: 3
Reputation: 887
This is what we use in Zolang for single line string literals with embedded templates ${...}
\"(\$\{.*\}|\\.|[^\"\\])*\"
Upvotes: 2
Reputation: 11
An answer that arrives late but which can be useful for the next one who will need it:
\"(([^\"]|\\\")*[^\\])?\"
Upvotes: 0
Reputation: 34592
How about using a start state...
int enter_dblquotes = 0; %x DBLQUOTES %% \" { BEGIN(DBLQUOTES); enter_dblquotes++; } <DBLQUOTES>*\" { if (enter_dblquotes){ handle_this_dblquotes(yytext); BEGIN(INITIAL); /* revert back to normal */ enter_dblquotes--; } } ...more rules follow...
It was similar to that effect (flex uses %s
or %x
to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.
Upvotes: 9
Reputation: 299
For a single line... you can use this:
\"([^\\\"]|\\.)*\" {/*matches string-literal on a single line*/;}
Upvotes: 29