Reputation: 247
just to give you background. We have a school project where we need to write our own compiler in C. My task is to write a lexical analysis. So far so good but I am having some difficulties with escape sequences.
When I find an escape sequence and the escape sequence is correct I have it saved in a string which looks like this \xAF otherwise it is lexical error.
My problem is how do I convert the string containing only escape sequence to one char? So I can add it to "buffer" containing the rest of the string.
I had an idea about a massive table containing only escape sequences and then comparing it one by one but it does not seem elegant.
Upvotes: 1
Views: 437
Reputation: 6684
flex has a start
condition. This enables contextual analysis.
For instance, there is an example for C comment analysis(between /*
and */
) in flex manual:
<INITIAL>"/*" BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/" BEGIN(INITIAL);
[^*\n]+ /* eat comment in chunks */
"*" /* eat the lone star */
\n yylineno++;
}
The start condition also enables string literal analysis. There is an example of how to match C-style quoted strings using start conditions in the item Start Conditions, and there is also FAQ item titled "How do I expand backslash-escape sequences in C-style quoted strings?"
in flex manual.
Probably this will answer your question.
Upvotes: -1
Reputation: 409472
This solution can be used for numerical escape sequences of all lengths and type, both octal, hexadecimal and others.
What you do when you see a '\'
is to check the next character. If it's a 'x'
(or 'X'
) then you read one character, if it's a hexadecimal digit (isxdigit
) then you read another. If the last is not a hexadecimal digit then put it back into the stream (an "unget" operation), and use only the first digit you read.
Each digit you read you put into a string, and then you can use e.g. strtol
to convert that string into a number. Put that number directly into the token value.
For octal sequences, just up to three characters instead.
For an example of a similar method see this old lexer I made many years ago. Search for the
lex_getescape
function. Though this method uses direct arithmetic instead of strtoul
to convert the escape code into a number, and not the standard isxdigit
etc. functions either.
Upvotes: 3
Reputation: 19473
you can use the following code, call xString2char with your string.
char x2char(const char c)
{
if (c >= '0' && c <= '9')
return c - '0';
if (c >= 'a' && c <= 'f')
return c - 'a';
if (c >= 'A' && c <= 'F')
return c - 'A';
//if we got here it's an error - handle it as you like...
}
char xString2char(const char* buf)
{
char ans;
ans = x2char(buf[2]);
ans <<= 4;
ans += x2char(buf[3]);
return ans;
}
This should work, just add the error checking & handling (in case you didn't already validate them in your code)
Upvotes: 2