Fly Man
Fly Man

Reputation: 11

Why does Pest parser fail when requiring whitespace between tokens?

I am trying to build a simple parser using Rust and pest. My pest rules are:

WHITESPACE = { " " | "\t" | "\n" | "\r"  }
STRING = @{"\"" ~ (!("\"" | "\n") ~ ANY)* ~ "\"" }
Name = { ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

This is the LET rule:

LET = { "let" ~ WHITESPACE+ ~ Name ~ WHITESPACE* ~ "=" ~ WHITESPACE* ~ STRING ~ WHITESPACE* ~ ";" }

The problem here is the parser fails at this line:

let var_name = "string";

If I replace the WHITESPACE+ from the LET rule by WHITESPACE* it works but the issue I want to force the space between the let keyword and var_name in my input.

The questions are:

Upvotes: 1

Views: 48

Answers (1)

Jmb
Jmb

Reputation: 23443

WHITESPACE is a special rule, and WHITESPACE* is inserted implicitly for each ~. So in your case "let" ~ WHITESPACE+ is actually equivalent to "let" ~ WHITESPACE* ~ WHITESPACE+, meaning that the space has already been consumed by the implicit WHITESPACE* when it tries to match WHITESPACE+. If you want to control where white space is allowed, you need to use an atomic rule:

LET_KW = @{ "let" ~ WHITESPACE }
LET = { LET_KW ~ Name ~ "=" ~ STRING ~ ";" }

Upvotes: 1

Related Questions