knechtsr
knechtsr

Reputation: 11

Regex For Strings in C

I'm looking to make a regular expression for some strings in C.

This is what i have so far:

Strings in C are delimited by double quotes (") so the regex has to be surrounded by \" \".

The string may not contain newline characters so I need to do [^\n] ( I think ).

The string may also contain double quotes or back slash characters if and only if they're escaped. Therefore [\\ \"] (again I think).

Other than that anything else goes.

Any help is much appreciated I'm kind of lost on how to start writing this regex.

Upvotes: 0

Views: 5968

Answers (1)

rici
rici

Reputation: 241931

A simple flex pattern to recognize string literals (including literals with embedded line continuations):

["]([^"\\\n]|\\.|\\\n)*["]

That will allow

   "string with \
line continuation"

But not

"C doesn't support
 multiline strings"

If you don't want to deal with line continuations, remove the \\\n alternative. If you need trigraph support, it gets more irritating.

Although that recognizes strings, it doesn't attempt to make sense of them. Normally, a C lexer will want to process strings with backslash sequences, so that "\"\n" is converted to the two characters "NL (0x22 0x0A). You might, at some point, want to take a look at, for example, Optimizing flex string literal parsing (although that will need to be adapted if you are programming in C).

Flex patterns are documented in the flex manual. It might also be worthwhile reading a good reference on regular expressions, such as John Levine's excellent book on Flex and Bison.

Upvotes: 5

Related Questions