rojcyk
rojcyk

Reputation: 247

How to convert string with escape sequence to one char in C

just to give you background. We have a school project where we need to write our own compiler in C. My task is to write a lexical analysis. So far so good but I am having some difficulties with escape sequences.

When I find an escape sequence and the escape sequence is correct I have it saved in a string which looks like this \xAF otherwise it is lexical error.

My problem is how do I convert the string containing only escape sequence to one char? So I can add it to "buffer" containing the rest of the string.

I had an idea about a massive table containing only escape sequences and then comparing it one by one but it does not seem elegant.

Upvotes: 1

Views: 437

Answers (3)

askmish
askmish

Reputation: 6684

flex has a start condition. This enables contextual analysis. For instance, there is an example for C comment analysis(between /* and */) in flex manual:

<INITIAL>"/*"   BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/"            BEGIN(INITIAL);
[^*\n]+         /* eat comment in chunks */
"*"             /* eat the lone star */
\n              yylineno++;
}

The start condition also enables string literal analysis. There is an example of how to match C-style quoted strings using start conditions in the item Start Conditions, and there is also FAQ item titled "How do I expand backslash-escape sequences in C-style quoted strings?" in flex manual. Probably this will answer your question.

Upvotes: -1

Some programmer dude
Some programmer dude

Reputation: 409472

This solution can be used for numerical escape sequences of all lengths and type, both octal, hexadecimal and others.

What you do when you see a '\' is to check the next character. If it's a 'x' (or 'X') then you read one character, if it's a hexadecimal digit (isxdigit) then you read another. If the last is not a hexadecimal digit then put it back into the stream (an "unget" operation), and use only the first digit you read.

Each digit you read you put into a string, and then you can use e.g. strtol to convert that string into a number. Put that number directly into the token value.

For octal sequences, just up to three characters instead.


For an example of a similar method see this old lexer I made many years ago. Search for the lex_getescape function. Though this method uses direct arithmetic instead of strtoul to convert the escape code into a number, and not the standard isxdigit etc. functions either.

Upvotes: 3

Roee Gavirel
Roee Gavirel

Reputation: 19473

you can use the following code, call xString2char with your string.

char x2char(const char c)
{
    if (c >= '0' && c <= '9')
        return c - '0';
    if (c >= 'a' && c <= 'f')
        return c - 'a';
    if (c >= 'A' && c <= 'F')
        return c - 'A';
    //if we got here it's an error - handle it as you like...
}

char xString2char(const char* buf)
{
    char ans;
    ans = x2char(buf[2]);
    ans <<= 4;
    ans += x2char(buf[3]);
    return ans;
}

This should work, just add the error checking & handling (in case you didn't already validate them in your code)

Upvotes: 2

Related Questions