tkit
tkit

Reputation: 8632

How to detect new line in Jison?

I have a Jison piece of code that looks like this:

%lex
%options flex

%{
if (!('regions' in yy)) {
    yy.regions = [];
}
%}

text                [a-zA-Z][a-zA-Z0-9]*

%%

\s+                 /* skip whitespace */
\n+                 return 'NL';
","                 return ',';
"-"                 return '-';
"["                 return '[';
"]"                 return ']';
{text}              return 'TEXT';
<<EOF>>             return 'EOF';

/lex

%start expressions

%%

expressions
    : content EOF
        {
            console.log(yy.regions);
            return yy.regions; 
        }
    | EOF
        {
            console.log("empty file");
            return yy.regions; 
        }
    ;

content
    : line NL content
        { console.log("NL"); }
    | line content
        { console.log("no NL"); }
    //| line NL
    //    { console.log("parsing line with NL"); }
    | line
        { console.log("parsing line"); }
    ;

line 
    : '[' text ']'
        { yy.regions.push($2); $$ = $2; }
    ;

text
    : TEXT
        { $$ = $1; }
    ;

This is what my input looks like at the moment (I started from the most basic construct that I plan on having and I would like to build it up from there):

[sectionA]
[sectionB]
[sectionC]

The problem I'm having is that the new line is not detected. It always goes into the line content and never into line NL content. Later on I would like to parse something that looks more like this:

[sectionA]
something1, something2, something3
something4, something5, something6

[sectionB]
something4, something5, something6

[sectionC]
something4, something5, something6
something4, something5, something6
something4, something5, something6

In the future this will get a little more complicated but my initial idea was to kind of break it down to per-line basis (new line would serve as a delimiter in many cases). I'm totally new to this stuff so I might have a completely wrong idea on how to solve this. So my question is how do I detect the new line? Also if there is a better approach to what I'm trying to do, any advice is more than welcome. Thanks.

Upvotes: 2

Views: 602

Answers (2)

tkit
tkit

Reputation: 8632

@rici's answer helped and it put me on the right track. However, [ \t]+ didn't do what I needed. These are the two lines I ended up using:

(\r?\n)+\s*         return 'NEWLINE';
[^\S\r\n]+          ; /* whitespace */

I found them here.

Edit: @rici's updated answer is clearer than this answer and does exactly what I need so I'm accepting that.

Upvotes: 0

rici
rici

Reputation: 241861

Both of these rules will match a newline:

\s+                 /* skip whitespace */
\n+                 return 'NL';

Since the first one is first, it will win. (Flex would give you a warning about the second rule being unused, but I don't believe jison does that analysis.)

Changing the order of the rules won't help, though, because the first rule will match SPACE NL, thereby swallowing the newline if it is preceded by whitespace. You need to change the whitespace rule to only match whitespace which is not newlines.

One possibility would be:

\n\s*     return 'NL';
[^\S\n]+  /* ignore whitespace other than newlines */

The first pattern will match a newline followed by any sequence of whitespace, which means that it will match multiple newlines. That will avoid returning more than one NL token when there is a blank line in the input; unless blank lines are significant, that's probably what you want.

The second pattern avoids matching any newline, so it cannot conflict with the first pattern.

Some people worry about the use of Windows line-endings (\r\n) but since Javascript's \s includes \r, there is no real problem here. The \r will be ignored by the second rule and the \n recognized by the first one. You could change the first rule to \r?\n\s* for efficiency if you thought that necessary, but it might not turn out to be any faster.

Upvotes: 5

Related Questions