Ted Mosby
Ted Mosby

Reputation: 73

I want to create a program that checks if the input is syntactically valid in C

How do I create a C program that takes in a command in C as input, and prints if there are any errors?

For example,

    Input: for(i=0; i<5; i++);
    No errors.

    Input: for((i=0); i>2,(i<5) ; i++);
    No errors.

    Input: for(i=0, i<5; i++);
    Error.

    Input: for((i=0; i<5; i++));
    Error.

I think that the shortest way to do it would be to compile the input string, and check if any errors occur. But I don't know how to compile a piece of code passed during runtime. Also, there is the problem of the undeclared variables.

Upvotes: 0

Views: 237

Answers (1)

Remo.D
Remo.D

Reputation: 16512

C is particuraly complicated to parse. Even if you remove the pre-processor side of it, there are still tricky parts.

If you are learning this for yourself, you can look for a C grammar (there are some in lex/yacc format for ANSI C) that you can look at.

From your example I guess you want to be able to recognize just a subset of C, not the entire language, correct? If this is the case, you should define that subset and write a parser for it.

Rather than trying to write a parser by hand, you should learn about grammars and parser generators if you haven't worked with them already.

If it's a subset of C you're interested in, I would use a PEG parse generator like PackCC to easily get a parser.

As a super-simple example, the following parser:

%prefix "mc"

stmt <- _ assign            { printf("assignment\n"); }   
     / _ if                 { printf("if\n"); }
     / ( !EOL . )* EOL      { printf("error\n"); }

if <- 'if' _ '(' _ var _ ')' _ stmt 

assign <- var _ '=' _ num _ ';' _ EOL
var    <- [A-Za-z_] [0-9A-Za-z_]*
num    <- [0-9]+
_      <- [ \t]*
EOL    <- '\n' / '\r\n' / '\r'

%%
int main() {
    mc_context_t *ctx = mc_create(NULL);
    while (mc_parse(ctx, NULL));
    mc_destroy(ctx);
    return 0;
}

will accept assignment (where the lvalue is a variable name and the rvalue an integer) and if statements where the condition is a variable name. Assuming the parser is in the file mu.peg

home> packcc mu.peg
home> gcc -o mu mu.c
home> ./mu
t = 5;
assignment
t = 5
error
if (x) p = 3;
assignment
if
if (x) if (y) t = 3;
assignment
if 
if

If, instead, you need to check sintatically valid C code at runtime and there's a compiler installed on your system, tipically GCC or Clang, you can call it via system() and intercept any error. Depending on the type of checks you want to do on the code, you can consider using a static analyzer like splint.

If you need to embed everything in your application, you can try tcc which is also availble as a library.

Upvotes: 3

Related Questions