Reputation: 1
I am using the pegjs parser generator for a project and I am having difficulty creating a grammar that should match all words up until a collection of words that it should not match. as an example in the string "the door is yellow" I want to be able to match all words up until is, tell the pegjs parser to start parsing from the word is. The collection of words I want to the parser to break on are "is" "has" and "of".
current grammar rule is as follows:
subject "sub" =
s:[a-zA-Z ]+ { return s.join("").trim()}
How can i create a look ahead that stops the parser from including my collection on words?
(!of|is|has)
Upvotes: 0
Views: 837
Reputation: 10414
I know this question was asked 5 years ago, but I'm just running through cleaning up unanswered questions in the [pegjs] tag.
This seems to work, and you just need to replace postfix
with your further processing rule.
subject "sub" = prefix:prefix breakWord:breakWord postfix:postfix "\n"? {
return { prefix: prefix, breakWord, postfix }
}
prefix = $(!breakWord .)* { return text().trim() }
postfix = [^\n]* { return text().trim() }
breakWord
= "is"
/ "has"
/ "of"
which generates this with an input of "the door is yellow":
{ prefix: "the door", breakWord: "is", postfix: "yellow" }
Note a couple of things:
(!breakWord .)
is a little slow; it looks ahead to make sure the current input doesn't begin with any of the words in the breakWord
set of alternates -- for each character in the prefix.breakWord
rule.postfix
rule assumes that a newline might terminate the input.Upvotes: 1
Reputation: 680
This will work
.+(?=\s+(of|is|has))
It matches one or more of any characters (except line breaks) until it encounters either 'of', 'is', or 'has' (via a positive lookahead) with white space before them.
Upvotes: -1