Perl6 Parse File

Question

As practice, I'm trying to parse some standard text that is an output of a shell command.

  pool: thisPool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: none requested
config:

    NAME                                                STATE     READ WRITE CKSUM
    homePool                                            ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        ata-WDC_WD5000AZLX-00CL5A0_WD-WCC3F7NUE93C      ONLINE       0     0     0
        ata-WDC_WD5000AZLX-00CL5A0_WD-WCC3F7RE2A4F      ONLINE       0     0     0
    cache
      ata-KINGSTON_SV300S37A60G_50026B7261025D7E-part3  ONLINE       0     0     0

errors: No known data errors

I want to use a Perl6 grammar and I want to capture each of the fields in a separate token or regex. So, I made the following grammar:

grammar zpool {
        regex TOP { \s+ [   ]+ }
        token keyword { "pool: " | "state: " | "status: " | "action: " | "scan: " | "config: " | "errors: " }
        regex collection { [<:!keyword>]*  }
}

My idea is that the regex finds a keyword, then begins collecting all the data until the next keyword. However, each time, I just get "pool: " -> all the remaining text.

 keyword => ｢pool: ｣
 collection => ｢homePool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: none requested
config:

    NAME                                                STATE     READ WRITE CKSUM
    homePool                                            ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        ata-WDC_WD5000AZLX-00CL5A0_WD-WCC3F7NUE93C      ONLINE       0     0     0
        ata-WDC_WD5000AZLX-00CL5A0_WD-WCC3F7RE2A4F      ONLINE       0     0     0
    cache
      ata-KINGSTON_SV300S37A60G_50026B7261025D7E-part3  ONLINE       0     0     0

errors: No known data errors
｣

I don't know how to get it to stop eating the characters when it finds a keyword and then treat that as another keyword.

raiph · Accepted Answer

Problem 1

You've written <:!keyword> instead of . That's not what you want. You need to delete the :.

The <:foo> syntax in a P6 regex matches a single character with the specified Unicode property, in this case the property :foo which in turn means :foo(True).

And <:!keyword> matches a single character with the Unicode property :keyword(False).

But there is no Unicode property :keyword.

So the negative assertion will always be true and will always match a single character of input each time.

So the pattern just munches its way thru the rest of the text, as you know.

Problem 2

Once you fix problem 1, a second problem arises.

<:!keyword> matches a single character with the Unicode property :keyword(False). It automatically munches some input (a single character) each time it matches.

In contrast, does not consume any input if it matches. You have to make sure the pattern that uses it munches input.

After fixing those two problems you'll get the sort of output you expected. (The next problem you'll see is that the config keyword doesn't work because the : in config: in your input file example isn't followed by a space.)

So, with a few clean ups:

my @keywords =  ;

say grammar zpool {
    token TOP        { \s+ [   ]* }
    token keyword    { @keywords ': ' }
    token collection { [  . ]* }
}

I've switched all the patterns to token declarations. In general, always use token unless you know you need something else. (regex enables backtracking. That can dramatically slow things down if you're not careful. rule makes spaces in the rule significant.)

I've extracted the keywords into an array. @keywords means @keywords[0] | @keywords[1] | ....

I've added a . after in the last pattern (to consume a character's worth of input, to avoid the infinite loop that would otherwise occur given that does not consume any input).

In case you haven't seen them, note that the available grammar debugging options are your friend.

Hth

Perl6 Parse File

Answers (2)

Related Questions