Happy Mittal
Happy Mittal

Reputation: 3747

Regular expression for constants in C

I want to write regular expression for constants in C language. So I tried this:

Let

Then:

I want to know whether I have written correct R.E. Is there any other way of writing this?

Upvotes: 0

Views: 8636

Answers (4)

Jean-Damien Durand
Jean-Damien Durand

Reputation: 166

From perl point of view I came up with the following regexp, after reading ISO C 2011:

my $I_CONSTANT = qr/^(?:(0[xX][a-fA-F0-9]+(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?)             # Hexadecimal
                      |([1-9][0-9]*(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?)                    # Decimal
                      |(0[0-7]*(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?)                        # Octal
                      |([uUL]?'(?:[^'\\\n]|\\(?:[\'\"\?\\abfnrtv]|[0-7]{1..3}|x[a-fA-F0-9]+))+')    # Character
                    )$/x;

Upvotes: 1

wilhelmtell
wilhelmtell

Reputation: 58685

First, C does not support Unicode literals, so you can eliminate the last rule. You also only define integer literals, not floating-point literals and not string or character literals. For the sake of my convenience I assume that that is what you intended.

INT    := OCTINT | DECINT | HEXINT
DECINT := [1-9] [0-9]* [uU]? [lL]? [lL]?
OCTINT := 0 [0-7]* [uU]? [lL]? [lL]?
HEXINT := 0x [0-9a-fA-F]+ [uU]? [lL]? [lL]?

These only describe the form of the literals, not any logic such as maximum values.

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 754700

The 'RE' makes sense if we interpret the 'U' as being similar to set union. However, it is more conventional to use a '|' symbol to denote alternatives.

First, you are only dealing with integer constants, not with floating point or character or string constants, let alone more complex constants.

Second, you have omitted '0X' as a valid hex prefix.

Third, you have omitted the various suffixes: U, L, LL, ULL (and their lower-case and mixed case synonyms and permutations).

Also, the C standard (§6.4.4.1) distinguishes between digits and non-zero digits in a decimal constant:

decimal-constant:
    nonzero-digit
    decimal-constant digit

Any integer constant starting with a zero is an octal constant, never a decimal constant. In particular, writing 0 is writing an octal constant.

Upvotes: 2

Jens Gustedt
Jens Gustedt

Reputation: 78953

There is another type of integer constants, namely integer character constants such as 'a' or '\n'. In C99 these are constants and their type is just int.

The best regular expressions for all these are found in the standard, section 6.4, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

Upvotes: 8

Related Questions