Regexp::Grammars handling

Question

I'm running the example from slide 15:

qr{
  
      <[text]>+
      .+
}xm;

When running against a multi-line text:

line_1
line_2

I get:

'text' => [ 'line-1',
            '
            line-2' ]

and so far I've not succeeded getting rid of the ' ' in front of the second line captured.

Running Regexp::Grammers 1.048 on top of Strawberry perl 5.26.1.

update / clarification Having (pre-maturely - sorry!) raised a bug against the module, Damian clarified as follows (reply slightly adapted to match above example):

A rule with whitespace within it matches any whitespace (including newlines) in the input at that point. So a rule like:

.+

is really equivalent to:

<.ws>.+

meaning: match-but-don't-capture any leading whitespace, then match any-characters-except-newline.

If you want whitespace inside the rule to be ignored (as you seem to want here), then you need to declare the rule as a token instead. Tokens don't have the magical "whitespace-matches-whitespace" behaviour of rules. Hence you would write:

.+

in which case you will also need to explicitly consume the newlines separating each line, with something like:

 <[line]>+ %

Ken Schumack · Accepted Answer

This works:

qr{
  
    <[text]>+ % [
]+
    .+
}xm;

The lines of data are meant to be separated by EOL character(s) which the

[
]+

specifies. Note: some Windows files end each line with both a new line and a line feed character hence the [ ]+ pattern. You can read more about this by doing a perldoc Regexp::Grammars and searching for separator

Regexp::Grammars handling \n

Answers (1)

Related Questions