Reputation: 211
I'm trying to wrap my head around PEG by entering simple grammars into the PEG.js playground.
Example 1:
"abcdef1234567ghijklmn8901opqrs"
Desired output: ["abcdef", "1234567",
"ghijklmn", "8901", "opqrs"]
Actual output: ["abcdef", ["1234567", ["ghijklmn", ["8901", ["opqrs", ""]]]]]
This example pretty much works, but can I get PEG.js to not nest the resulting array to a million levels? I assume the trick is to use concat()
instead of join()
somewhere, but I can't find the spot.
start
= Text
Text
= Numbers Text
/ Characters Text
/ EOF
Numbers
= numbers: [0-9]+ {return numbers.join("")}
Characters
= text: [a-z]+ {return text.join("")}
EOF
= !.
Example 2:
Same problem and code as Example 1, but change the Characters rule to the following, which I expected would produce the same result.
Characters
= text: (!Numbers .)+ {return text.join("")}
The resulting output is:
[",a,b,c,d,e,f", ["1234567", [",g,h,i,j,k,l,m,n", ["8901", [",o,p,q,r,s", ""]]]]]
Why do I get all these empty matches?
Example 3:
Last question. This doesn't work at all. How can I make it work? And for bonus points, any pointers on efficiency? For example, should I avoid recursion if possible?
I'd also appreciate a link to a good PEG tutorial. I've read (http://www.codeproject.com/KB/recipes/grammar_support_1.aspx), but as you can see I need more help ...
'abcdefghijklmnop"qrstuvwxyz"abcdefg'
["abcdefghijklmnop", "qrstuvwxyz",
"abcdefg"]
"abcdefghijklmnop\"qrstuvwxyz\"abcdefg"
start
= Words
Words
= Quote
/ Text
/ EOF
Quote
= quote: ('"' .* '"') Words {return quote.join("")}
Text
= text: (!Quote . Words) {return text.join("")}
EOF
= !.
Upvotes: 21
Views: 7666
Reputation: 17387
For current versions of pegjs
, you might try:
Input: "abcdef1234567ghijklmn8901opqrs"
Desired output: ["abcdef", "1234567", "ghijklmn", "8901", "opqrs"]
{
/**
* Deeply flatten an array.
* @param {Array} arr - array to flatten
* @return {Array} - flattened array
*/
const flatten = (arr) => Array.isArray(arr) ? arr.reduce((flat, elt) => flat.concat(Array.isArray(elt) ? flatten(elt) : elt), []) : arr
}
start = result:string {
console.log(JSON.stringify(result))
return result
}
string = head:chars tail:( digits chars? )* {
return flatten([head,tail])
}
chars = [a-z]+ {
return text()
}
digits = $[0-9]+ {
return text()
}
Should be easy to deduce from the answer above.
Input: 'abcdefghijklmnop"qrstuvwxyz"abcdefg'
Desired output: ["abcdefghijklmnop", "qrstuvwxyz", "abcdefg"]
{
/**
* Deeply flatten an array.
* @param {Array} arr - array to flatten
* @return {Array} - flattened array
*/
const flatten = (arr) => Array.isArray(arr) ? arr.reduce((flat, elt) => flat.concat(Array.isArray(elt) ? flatten(elt) : elt), []) : arr
}
start = result:string {
console.log(JSON.stringify(result))
return result
}
string = head:chars tail:quote_chars* {
return flatten([head,tail])
}
quote_chars = DQUOTE chars:chars {
return chars
}
chars = [a-z]+ {
return text()
}
DQUOTE = '"'
Upvotes: 1
Reputation: 1605
start
= alnums
alnums
= alnums:(alphas / numbers) {
return alnums;
}
alphas
= alphas:$(alpha+)
numbers
= numbers:$(number+)
number
= [0-9]
alpha
= [a-zA-Z]
ignore
> 'abcdefghijklmnop"qrstuvwxyz"abcdefg'.split('"')
[ 'abcdefghijklmnop',
'qrstuvwxyz',
'abcdefg' ]
Upvotes: -2
Reputation: 221
I received a reply in the PEG.js Google Group that helped me onto the right track. I'm posting answers to all three problems in the hope that they can serve as a rudimentary tutorial for other PEG beginners like myself. Notice that no recursion is needed.
This is straightforward once you understand basic PEG idioms.
start
= Text+
Text
= Numbers
/ Characters
Numbers
= numbers: [0-9]+ {return numbers.join("")}
Characters
= text: [a-z]+ {return text.join("")}
The problem here is a peculiar design choice in the PEG.js parser generator for Peek expressions (&expr and !expr). Both peek ahead into the input stream without consuming any characters, so I incorrectly assumed that they didn't return anything. However, they both return an empty string. I hope the author of PEG.js changes this behavior, because (as far as I can tell) this is just unnecessary cruft that pollutes the output stream. Please correct me if I'm wrong about this!
Anyway, here is a workaround:
start
= Text+
Text
= Numbers
/ Words
Numbers
= numbers: [0-9]+ {return numbers.join("")}
Words
= text: Letter+ {return text.join("")}
Letter
= !Numbers text: . {return text}
The problem is that an expression like ('"' .* '"')
can never succeed. PEG is always greedy, so .*
will consume the rest of the input stream and never see the second quote. Here is a solution (that incidentally needs the same Peek workaround as in Example 2).
start
= Words+
Words
= QuotedString
/ Text
QuotedString
= '"' quote: NotQuote* '"' {return quote.join("")}
NotQuote
= !'"' char: . {return char}
Text
= text: NotQuote+ {return text.join("")}
Upvotes: 22