changelog
changelog

Reputation: 4681

Ragel FSM for parsing SQL-like statements

I'm having a bit of a problem with Ragel, mostly due to still trying to grasp how the whole thing works.

I'm trying to make a simple parser for a language similar to SQL (but less flexible), where you have functions (all uppercase), identifiers (all lowercase) and where you could nest functions within functions.

Here's what I have so far:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

typedef struct Parser {
  int current_line;
  int nesting;

  /* Ragel FSM */
  int cs;
  const char *ts;
  const char *te;
  int act;
} Parser;

%%{
  machine gql;
  access parser->;

  Function    = [A-Z][A-Z_]+ ;

  Identifier  = [a-z][a-z_]+ ;
  Integer     = [0-9]+ ;

  Parameter = ( Identifier | Integer )+ ;

  WhiteSpace  = [ \t\r\n] ;

  action function_call {
    parser->nesting++;
    printf("FUNCTION CALL\n");
  }

  action function_finish {
    parser->nesting--;
    printf("FUNCTION FINISH!\n");
  }

  action function_add_identifier {
    printf("FUNCTION ADD IDENTIFIER\n");
  }

  FunctionCall =
    Function @function_call WhiteSpace* "("
      Parameter %function_add_identifier
      ( WhiteSpace* ',' WhiteSpace* Parameter %function_add_identifier )* WhiteSpace*
    %function_finish ')' ;

  main := FunctionCall ;
}%%


%% write data;

void Parser_Init(Parser *parser) {
  parser->current_line  = 1;
  parser->nesting       = 0;
  %% write init;
}

void Parser_Execute(Parser *parser, const char *buffer, size_t len) {
  if(len == 0) return;

  const char *p, *pe, *eof;
  p   = buffer;
  pe  = buffer+len;
  eof = pe;

  %% write exec;
}

int main(int argc, char *argv[]) {
  Parser *parser = malloc(sizeof(Parser));
  Parser_Init(parser);

  printf("Parsing:\n%s\n\n\n", argv[1]);

  Parser_Execute(parser, argv[1], sizeof(argv[1]));

  printf("Parsed %d lines\n", parser->current_line);
  return 0;
}

It is calling the function_call action once per character, not picking up the Parameters, and I can't think how to make functions work inside functions.

Any tips on what I'm doing wrong here?

Upvotes: 1

Views: 773

Answers (1)

maxschlepzig
maxschlepzig

Reputation: 39065

The standard approach is to create a lexer (written in Ragel or GNU Flex) that just tokenizes your language input. The tokens are then consumed by a parser (not written in Ragel) that is able to parse recursive structures (e.g. nested functions) - with a parser generator like GNU Bison.

Note that Ragel includes (as advanced feature) directives to manage a stack (which enables you to parse recursive structures)- but with that you leave the domain of regular languages you otherwise are working with in ragel specifications. Thus, you could write a parser that is able to parse nested functions completely with Ragel. But a properly layered architecture (1st layer: lexer, 2nd layer: parser, ...) simplifies the task, i.e. the parts are easier to debug, test and maintain.

Upvotes: 4

Related Questions