gentleBandit
gentleBandit

Reputation: 1

Why does Ragel execute to-State and From-State actions twice?

I am testing out the functionality for to- and from-state actions in Ragel. I have the following Ragel program:

ragelScaffolding.rl:

#include <stdio.h>
#include <stdbool.h>
#include <string.h>

char *p, *pe;
int cs;

void runRagelMachine(char instructions[], int instructionLen){
p = instructions; 
pe = p + instructionLen;
%%{
    machine test;
    action testToAction1{
        puts("1");
    }

    action testFromAction1{
        puts("f1");
    }

    action testToAction2{
        puts("2");
    }

    test = (
        start: (
            any -> s1
        ),
        s1: (
            any -> s2
        )$to(testToAction1) $from(testFromAction1),
        s2: (
            any -> final
        )$to(testToAction2)
    );

    main := test;
    write data;
    write init;
    write exec;
}%%
}

int main(){

char buf[1024];
runRagelMachine(buf, 1024);
}

I would expect this to output the following:

1
f1
2

But instead it outputs:

1
f1
1
2
f1
2

Which tells me that it runs these actions twice. I have been thinking about why this might be the case and reading the documentation, but I can't seem to figure why this is happening. This happens when compiling with Ragel 6.9 and 7 (and compiling the C with gcc). The documentation says the following:

To-state actions are executed whenever the state machine moves into the specified state, either by a natural movement over a transition or by an action-based transfer of control such as fgoto. They are executed after the in-transition’s actions but before the current character is advanced and tested against the end of the input block.

But there is nothing in there about executing actions twice. I would really appreciate any help or clarification on this matter.

Thanks in advance.

Upvotes: 0

Views: 205

Answers (2)

John Sensebe
John Sensebe

Reputation: 1396

The problem is the use of the $ operators, which execute on all states, which means the action will run on the starting state of each label as well as the state that control is transferred to. You should be using the > operators in this case, which execute only when entering the start state at each label. This ensures that each action is only called once for each label. So, the machine would look like this:

test = (
    start: (
        any -> s1
    ),
    s1: (
        any -> s2
    )>to(testToAction1) >from(testFromAction1),
    s2: (
        any -> final
    )>to(testToAction2)
);

Here is the state diagram for the above:

State diagram for the above machine

As you can see, each action is called only once.

Upvotes: 1

Gnusam
Gnusam

Reputation: 48

When trying to understand how Ragel works, you can generate a Graphviz dot file with the -V option.

Here is the Graphviz of your Ragel file :

enter image description here

After a bit of reflexion on your question, here is how i think that Ragel is operating : i made it matching something precise instead of any, it makes things easier to understand.

I changed your code to :

    test = (
      start: (
          '1' -> s1
      )$to(testFromAction1),
      s1: (
          '2' -> s2
      )$to(testToAction1),
      s2: (
          '3' -> final
      )$to(testToAction2)
    );

and call it with :

int main()
{
  char buf[5];

  strcpy(buf, "1341");
  runRagelMachine(buf, 5);
}

This should not match entirely.

the graphvis now looks like this :

enter image description here

pretty much the same thing, but if i run it, here is the output :

f1
1

It matched the '1', which triggered the start state and called testFromAction1. Nothing matched in state s1 which did not prevent the call of testToAction1.

if we call it with :

  strcpy(buf, "1234");
  runRagelMachine(buf, 5);

It should match with all the states. We get this output :

f1
1
1
2
2

Ragel is parsing our string in tree steps.

  • First, it looks for the '0' starter state in the input. It calls testToAction1 (prints f1) and evaluates s1. At this occasion, it prints the first '1'.
  • Then, the parsers's cursor moves to the '1' of the input. It verifies the current state (s1) matches. For that it evaluates s1 again, this is why a second '1' is printed. Since s1 calls s2 state, it evaluates s2 too, which is the reason of the first '2' printed. Since the current data under the cursor matched with '2', the cursor moves to the '3'.
  • Now, we evaluate the s2 step another time, prints a '2' since we execute it to get in final state.

If we were running it with one last bogus input it confirms this logic :

  strcpy(buf, "1243");
  runRagelMachine(buf, 5);

this time it prints :

f1
1
1
2

We can see here it stopped the parsing on the '4' digit, still we have 4 lines printed. same logic as described earlier.

I'm not sure it entirely answer to your question, but i hope it helps understand a bit Ragel.

Upvotes: 0

Related Questions