Reputation:
Ragel is powerfull machine but I have trouble with 'optional' elements in a grammar. I have simple line with number or strings. The trouble is with whitespace. I dont know how put correctly optional whitespace between ',' and variable. Enter will be every where between token. The end line is ';' or enter. I need using $err() function for error.
This is my test set: good
this , is , a , test ; and, this,
is,ok
next, trouble
How,produce,good
grammar;
ok
output:
line(this,is,a,test)
line(and,this,is,ok)
line(next,trouble)
line(How,produce,good)
line(grammar)
line(ok)
and fail (this not = no ',')(',,' without number or variable)
this not , working
and,
this,, too
when i use this grammar i get separate chars or error on end of line
whitespace = [ \t\v\f] ;
enter = [\r\n] ;
string = (alnum | '_')+ ;
number = ('+'|'-')?[0-9]+'.'[0-9]+( [eE] ('+'|'-')? [0-9]+ )? ;
var = string | number ;
koniec = (';' | enter) ;
line = var whitespace* ( ',' whitespace* var )* whitespace* koniec ;
main := whitespace* ( line )* ;
this is my whole code https://github.com/and09/simple_grammar
Upvotes: 2
Views: 369
Reputation: 4977
It's a bit hard to give definitive answers when you don't have a full specification of your grammar, but let's at least try to make your example work the way you want it to and then you should be able to correct it if needed.
So, your full example from Github that has some printing actions in it, actually tells a lot about what's going on in the state machine (the other thing you should be periodically checking with while working with Ragel is state machine graph that it can produce for you). In its initial specification (same as in question) it outputs the following on run:
[this]< >,< >[is]
So it has a problem going into the third variable. Why is that? Well, that's because your line
only specifies one ( ',' whitespace* var)
element, but if you try to fix that by specifying ( ',' whitespace* var)*
, it won't also work because now you're demanding that your var
is to be immediately followed by comma on repetition, without any whitespace. Let's try this (actions intentionally removed), moving whitespace into the repeating group:
line = var whitespace* ( ',' whitespace* var whitespace*)* koniec;
Now you get this in the output:
[this]< >,< >[is]< >,< >[a]< >< >< >,< >[test]< >
Which is an obvious improvement. So why it fails now? Well, that's because after your koniec
the machine wants to wrap into the next line
, but in order to do that it needs to see a var
. But we have whitespace after ;
in the input instead. So we need to change our definition of line to enable some whitespace in the beginning, but that also makes whitespace redundant in the main
, so let's try these definitions:
line = whitespace* var whitespace* ( ',' whitespace* var whitespace*)* koniec;
main:= line*;
Now we have this output:
[this]< >,< >[is]< >,< >[a]< >< >< >,< >[test]< >
< >[and],< >[this]
Which again is better, but still not good enough. Now you can see that it chokes on newline, which actually is a bit unclear moment for me too. You say that
The end line is ';' or enter
Yet you want to get
line(and,this,is,ok)
So let's assume that enter starts a new line
unless you have a comma in the end of line. To specify that in the grammar, let's do this:
line = whitespace* var whitespace* ( ',' (whitespace | enter)* var whitespace*)* koniec;
Now you get this in the output:
[this]< >,< >[is]< >,< >[a]< >< >< >,< >[test]< >
< >[and],< >[this],[is],[ok]
Why is it not going further? That's because our line
has to have the var
but we have an empty line in the input instead. That also raises a question of whitespace-only lines, so let's make our line
work with whitespace-only content like this:
line = whitespace* (var whitespace* ( ',' (whitespace | enter)* var whitespace*)*)? koniec;
And bang! Suddenly you have all the word groups you want in the output. But you also have some excessive lines, that are actually very easy to fix, you just need to move your pisz_enter
action from koniec
into the line like this:
vargroup = var whitespace* ( ',' %pisz_przecinek (whitespace | enter)* var whitespace*)* %pisz_enter;
line = whitespace* vargroup? koniec;
That's it. Two other things I can notice are:
you want you number
to be something like
number = (('+'|'-')?[0-9]+'.'[0-9]+( [eE] ('+'|'-')? [0-9]+ )?) >Poczatek_Napisu %pisz_stala ;
to be printed properly
poczatek_napisu
) in your actions. If the token is split between chunks (which can occur with high probability on any file longer than sizeof bufor
) you're gonna have a problem (and it's not a FSM problem, the machine will work just fine, it's just what you do in actions), but that's beyond the scope of current question.Upvotes: 1