Reputation: 5388
I look here ANSI C grammar .
This page includes a lot of regular expressions in Lex/Flex for ANSI C.
Having a problem in understanding regular expression for string literals.
They have mentioned regular expression as \"(\\.|[^\\"])*\"
As I can understand \"
this is used for double quotes, \\
is for escape character, .
is for any character except escape character
and *
is for zero or more times.
[^\\"]
implies characters except \
, "
.
So, in my opinion, regular expression should be \"(\\.)*\"
.
Can you give some strings where above regular expression will fail?
or
Why they have used [^\\"]
?
Upvotes: 8
Views: 8008
Reputation: 9538
The regex \"(\\.)*\"
that you proposed matches strings that consist of \
symbols alternating with any characters like:
"\z\x\p\r"
This regular expression would therefore fail to match a string like:
"hello"
The string "hello"
would be matched by the regex \".*\"
but that would also match the string """"
or "\"
both of which are invalid.
To get rid of these invalid matches we can use \"[^\\"]*\"
, but this will now fail to match a string like "\a\a\a"
which is a valid string.
As we saw \"(\\.)*\"
does match this string, so all we need to do is combine these two to get \"(\\.|[^\\"])*\"
.
Upvotes: 5