Reputation: 785

Keyword-Label-Value style configuration file parsing library for C

Does a configuration parsing library exist already that will read the following style of file:

Keyword Label Value;

With nesting by { } replacing Values; optional Labels; support for "Include" would be nice.

An example configuration file might looks like:

Listen Inside 127.0.0.1:1000;
Listen Outside {
    IP 1.2.3.4;
    Port 1000;
    TLS {
        CertFile /path/to/file;
    };
};
ACL default_acl {
    IP 192.168.0.0/24;
    IP 10.0.0.0/24;
};

Upvotes: 1

Answers (3)

gbulmer

Reputation: 4290

What programming languages are you familiar with? My impression from your question is C.

It looks like like the tokens of your configuration language are regular expressions:

Listen
127.0.0.1:1000
1000
;
{
}
etc.

Almost all modern programming languages have some form of support for those.

If the implementation is C, I'd probably use flex. It generates a function which will apply a set of regular expressions, put the matched text into a C string, and return the type of that regular expression (just an int, which you choose). The function is a 'lexical analyser' or 'tokeniser'. It chops up streams of characters into handy units that match your needs, one regular expression at a time.

Flex is pretty easy to use. It has several advantages over lex. One is that you can have multiple lexical analysers functions, so if you need to do something odd for an include file, then you could have a second lexical analyser function for that job.

Your language looks simple. Bison/Yacc are very powerful tools, and "with great power comes great responsibility" :-)

I think it is sufficiently simple, that I might just write a parser by hand. It might only be a few functions to handle its structure. A technique that is very straightforward is called recursive descent parser. Have you got a CS degree, or understand this stuff?

Lots of people will (at this stage) tell you to get the 'Dragon Book' or one of its newer versions, often because that is what they had at college. The Dragon book is great, but it is like telling someone to read all of Wikipedia to find out about whales. Great if you have the time, and you'll learn a lot.

A reasonable start is the Wikipedia Recursive Descent parser article. Recursive descent is very popular because it is relatively straightforward to understand. The thing that makes it straightforward is to have a proper grammar which is cast into a form which is easy for recursive descent to parse. Then you literally write a function for every rule, with a simple error handling mechanism (that's why I asked about this). There are probably tools to generate them, but you might find it quicker to just write it. A first cut might take a day, then you'd be in a good position to decide.

A very nifty lex/flex feature is any characters which are not matched, are just echo'd to standard output. So you can see what your regular expressions are matching, and can add them incrementally. When the output 'dries up' everything is being matched.

Pontification alert: IMHO, more C programmers should learn to use flex. It is relatively easy to use, and very powerful for text handling. IMHO lots are put off because they are also told to use yacc/bison which are much more powerful, subtle and complex tools. end Pontification.

If you need a bit of help with the grammar, please ask. If there is a nice grammar (might not be the case, but so far your examples look okay) then implementation is straightforward.

I found two links to stackoverflow answers which look helpful:

Here is an example of using flex.

Flex takes a 'script', and generates a C function called yylex(). This is the input script.

Remember that all of the regular expressions are being matched within that yylex function, so though the script looks weird, it is really an ordinary C function. To tell the caller, which will be your recursive descent parser, what type of regular expression is matched, it returns an integer value that you choose, just like any ordinary C function.

If there is nothing to tell the parser about, like white space, and probably some form of comment, it doesn't return. It 'silently' consumes those characters. If the syntax needs to use newline, then that would be recognised as a token, and a suitable token value returned to the parser. It is sometimes easier to let it be more free form, so this example consumes and ignores all white space.

Effectively the yylex function is everything from the first %% to the second %%. It behaves like a big switch() statement.
The regular expressions are like (very exotic) case: labels.
The code inside the { ... } is ordinary C. It can contain any C statements, and must be properly nested within the { ... }

The stuff before the first %% is the place to put flex definitions, and a few 'instructions' to flex. The stuff inside %{ ... %} is ordinary C, and can include any headers needed by the C in the file, or even define global variables.

The stuff after the second %% is ordinary C, with no need for extra syntax, so no %{ ... %].

/* scanner for a configuration files */ 
%{ 
    /* Put headers in here */ 
#include <config.h>

%} 
%%
[0-9]+                  { return TOK_NUMBER; } 
[0-9]+"."[0-9]+"."[0-9]+"."[0-9]+":"[0-9]+ { return TOK_IP_PORT; } 
[0-9]+"."[0-9]+"."[0-9]+"."[0-9]+"/"[0-9]+ { return TOK_IP_RANGE; } 
"Listen"                { return TOK_KEYWORD_LISTEN; } 
[A-Za-z][A-Za-z0-9_]*   { return TOK_IDENTIFIER; }
"{"                     { return TOK_OPEN_BRACE; }
"}"                     { return TOK_CLOSE_BRACE; }
";"                     { return TOK_SEMICOLON; }
[ \t\n]+        /* eat up whitespace, do nothing */ 
.           { fprintf(stderr,  "Unrecognized character: %s\n", yytext ); 
              exit(1); 
            }
%%

/* -------- A simple test ----------- */
int main(int argc, char *argv[])
{ 
    int tok;
    yyin = stdin;
    while (tok=yylex()) {
        fprintf(stderr, "%d %s\n", tok, yytext);
    }
}

That has a minimal, dummy main, which calls the yylex() function to get the next token (enum) value. yytext is the string matched by the regular expression, so main just prints it.

WARNING, this is barely tested, little more than:

flex config.l
gcc lex.yy.c -ll
./a.out <tinytest

The values are just integers, so an enum in a header:

#ifndef _CONFIG_H_
#define _CONFIG_H_

enum TOKENS { 
 TOK_KEYWORD_LISTEN = 256, 
 TOK_IDENTIFIER = 257, 
 TOK_OPEN_BRACE = 258, 
 TOK_CLOSE_BRACE = 259, 
 TOK_SEMICOLON = 260, 
 TOK_IP_PORT = 261, 
 TOK_IP_RANGE = 262, 
 TOK_NUMBER = 263, 
};
#endif _CONFIG_H_

In your parser, call yylex when you need the next value. You'll probably wrap yylex in something which copies yytext before handing the token type value back to the parser.

You will need to be comfortable handling memory. If this were a large file, maybe use malloc to allocate space. But for small files, and to make it easy to get started and debug, it might makes sense to write your own 'dumb' allocator. A 'dumb' memory management system, can make debugging much easier. Initially just have a big char array statically allocated and a mymalloc() handing out pieces. I can imagine the configuration data never gets free()'d. Everything can be held in C strings initially, so it is straightforward to debug because the exact sequence of input is in the char array. An improved version might 'stat' a file, and allocates a piece big enough.

How you deal with the actual configuration values is a bit beyond what I can describe. Text strings might be all that is needed, or maybe there is already a mechanism for that. Often there is no need to store the text value of 'Keywords', because the parser has recognised what it means, and the program might convert other values, e.g. IP addresses, into some internal representation.

Upvotes: 2

rutgersmike

Reputation: 1213

Have you looked at lex and yacc (or alternatively, flex and bison)? It's a little hairy, but we use those to parse files that look exactly like your config file there. You can define sub-structures using brackets, parse variable-length lists with the same key, etc.

By labels do you mean comments? You can define your own comment structure, we use '#' to denote a comment line.

It doesn't support includes AFAIK.

Upvotes: 1

Ricardo Crudo

Reputation: 186

Exist C library's to JSON and YAML. They look like what you need.

Upvotes: 0

Keyword-Label-Value style configuration file parsing library for C

Answers (3)

Related Questions