Stefan_E
Stefan_E

Reputation: 117

Regexp::Grammars handling \n

I'm running the example from slide 15:

qr{
  <data>
  <rule: data>    <[text]>+
  <rule: text>    .+
}xm;

When running against a multi-line text:

line_1
line_2

I get:

'text' => [ 'line-1',
            '
            line-2' ]

and so far I've not succeeded getting rid of the '\n' in front of the second line captured.

Running Regexp::Grammers 1.048 on top of Strawberry perl 5.26.1.

update / clarification Having (pre-maturely - sorry!) raised a bug against the module, Damian clarified as follows (reply slightly adapted to match above example):

A rule with whitespace within it matches any whitespace (including newlines) in the input at that point. So a rule like:

<rule: text>    .+

is really equivalent to:

<rule: text><.ws>.+

meaning: match-but-don't-capture any leading whitespace, then match any-characters-except-newline.

If you want whitespace inside the rule to be ignored (as you seem to want here), then you need to declare the rule as a token instead. Tokens don't have the magical "whitespace-matches-whitespace" behaviour of rules. Hence you would write:

<token: line> .+

in which case you will also need to explicitly consume the newlines separating each line, with something like:

<rule: data> <[line]>+ % \n

Upvotes: 2

Views: 195

Answers (1)

Ken Schumack
Ken Schumack

Reputation: 719

This works:

qr{
  <data>
  <rule: data>  <[text]>+ % [\r\n]+
  <rule: text>  .+
}xm;

The lines of data are meant to be separated by EOL character(s) which the

[\r\n]+

specifies. Note: some Windows files end each line with both a new line \n and a line feed \r character hence the [\r\n]+ pattern. You can read more about this by doing a perldoc Regexp::Grammars and searching for separator

Upvotes: 1

Related Questions