Reputation: 3747
I want to write regular expression for constants in C language. So I tried this:
Let
Then:
I want to know whether I have written correct R.E. Is there any other way of writing this?
Upvotes: 0
Views: 8636
Reputation: 166
From perl point of view I came up with the following regexp, after reading ISO C 2011:
my $I_CONSTANT = qr/^(?:(0[xX][a-fA-F0-9]+(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?) # Hexadecimal
|([1-9][0-9]*(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?) # Decimal
|(0[0-7]*(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?) # Octal
|([uUL]?'(?:[^'\\\n]|\\(?:[\'\"\?\\abfnrtv]|[0-7]{1..3}|x[a-fA-F0-9]+))+') # Character
)$/x;
Upvotes: 1
Reputation: 58685
First, C does not support Unicode literals, so you can eliminate the last rule. You also only define integer literals, not floating-point literals and not string or character literals. For the sake of my convenience I assume that that is what you intended.
INT := OCTINT | DECINT | HEXINT
DECINT := [1-9] [0-9]* [uU]? [lL]? [lL]?
OCTINT := 0 [0-7]* [uU]? [lL]? [lL]?
HEXINT := 0x [0-9a-fA-F]+ [uU]? [lL]? [lL]?
These only describe the form of the literals, not any logic such as maximum values.
Upvotes: 2
Reputation: 754700
The 'RE' makes sense if we interpret the 'U' as being similar to set union. However, it is more conventional to use a '|' symbol to denote alternatives.
First, you are only dealing with integer constants, not with floating point or character or string constants, let alone more complex constants.
Second, you have omitted '0X
' as a valid hex prefix.
Third, you have omitted the various suffixes: U
, L
, LL
, ULL
(and their lower-case and mixed case synonyms and permutations).
Also, the C standard (§6.4.4.1) distinguishes between digits and non-zero digits in a decimal constant:
decimal-constant:
nonzero-digit
decimal-constant digit
Any integer constant starting with a zero is an octal constant, never a decimal constant. In particular, writing 0
is writing an octal constant.
Upvotes: 2
Reputation: 78953
There is another type of integer constants, namely integer character constants such as 'a'
or '\n'
. In C99 these are constants and their type is just int
.
The best regular expressions for all these are found in the standard, section 6.4, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
Upvotes: 8