Reputation: 679
I'm attempting to write an Augmented Backus-Naur form parser. However, I am coming across a Stack Overflow exception whenever I attempt to parse alternatives. Below is an example which triggers the issue:
#r @"..\packages\FParsec\lib\net40-client\FParsecCS.dll"
#r @"..\packages\FParsec\lib\net40-client\FParsec.dll"
open FParsec
type Parser<'t> = Parser<'t, unit>
type Element =
| Alternates of Element list
| ParsedString of string
let (pRuleElement, pRuleElementRef) : (Parser<Element> * Parser<Element> ref) = createParserForwardedToRef()
let pString =
pchar '"' >>. manyCharsTill (noneOf ['"']) (pchar '"')
|>> ParsedString
let pAlternates : Parser<_> =
sepBy1 pRuleElement (many (pchar ' ') >>. (pchar '/') >>. many (pchar ' ') )
|>> Alternates
do pRuleElementRef :=
choice
[
pString
pAlternates
]
"\"0\" / \"1\" / \"2\" / \"3\" / \"4\" / \"5\" / \"6\" / \"7\""
|> run (pRuleElement .>> (skipNewline <|> eof))
The issue is easily resolved by simply reordering the choice
like so:
do pRuleElementRef :=
choice
[
pAlternates
pString
]
However, that then causes a Stack Overflow because it continuously attempts to parse a new sequence of alternatives without consuming input. In addition, that method would then break ABNF precedence:
My question essentially boils down to this: How can I combine parsing of a single element that can be a sequence of elements or a single instance of an element? Please let me know if you require any clarification / additional examples.
Your help is much appreciated, thank you!
EDIT:
I should probably mention that there are various other kinds of groupings as well. A sequence group (element[s])
and an optional group [optional element[s]
. Where element
can be nested groups / optional groups / strings / other element types. Below is an example with sequence group parsing (optional group parsing not included for simplicity):
#r @"..\packages\FParsec\lib\net40-client\FParsecCS.dll"
#r @"..\packages\FParsec\lib\net40-client\FParsec.dll"
open FParsec
type Parser<'t> = Parser<'t, unit>
type Element =
| Alternates of Element list
| SequenceGroup of Element list
| ParsedString of string
let (pRuleElement, pRuleElementRef) : (Parser<Element> * Parser<Element> ref) = createParserForwardedToRef()
let pString =
pchar '"' >>. manyCharsTill (noneOf ['"']) (pchar '"')
|>> ParsedString
let pAlternates : Parser<_> =
pipe2
(pRuleElement .>> (many (pchar ' ') >>. (pchar '/') >>. many (pchar ' ')))
(sepBy1 pRuleElement (many (pchar ' ') >>. (pchar '/') >>. many (pchar ' ') ))
(fun first rest -> first :: rest)
|>> Alternates
let pSequenceGroup : Parser<_> =
between (pchar '(') (pchar ')') (sepBy1 pRuleElement (pchar ' '))
|>> SequenceGroup
do pRuleElementRef :=
choice
[
pAlternates
pSequenceGroup
pString
]
"\"0\" / ((\"1\" \"2\") / \"2\") / \"3\" / (\"4\" / \"5\") / \"6\" / \"7\""
|> run (pRuleElement .>> (skipNewline <|> eof))
If I attempt to parse alternates / sequence groups first, it terminates with a stack overflow
exception because it then tries to parse alternates repeatedly.
Upvotes: 3
Views: 111
Reputation: 679
It looks like the solution was simply not attempting to parse alternatives whilst parsing alternatives in order to avoid an infinite loop resulting in a stack overflow. A working version of the code posted in my question is as follows:
#r @"..\packages\FParsec\lib\net40-client\FParsecCS.dll"
#r @"..\packages\FParsec\lib\net40-client\FParsec.dll"
open FParsec
type Parser<'t> = Parser<'t, unit>
type Element =
| Alternates of Element list
| SequenceGroup of Element list
| ParsedString of string
let (pRuleElement, pRuleElementRef) : (Parser<Element> * Parser<Element> ref) = createParserForwardedToRef()
let (pNotAlternatives, pNotAlternativesRef) : (Parser<Element> * Parser<Element> ref) = createParserForwardedToRef()
let pString =
pchar '"' >>. manyCharsTill (noneOf ['"']) (pchar '"')
|>> ParsedString
let pAlternates : Parser<_> =
pipe2
(pNotAlternatives .>>? (many (pchar ' ') >>? (pchar '/') >>. many (pchar ' ')))
(sepBy1 pNotAlternatives (many (pchar ' ') >>? (pchar '/') >>. many (pchar ' ') ))
(fun first rest -> first :: rest)
|>> Alternates
let pSequenceGroup : Parser<_> =
between (pchar '(') (pchar ')') (sepBy1 pRuleElement (pchar ' '))
|>> SequenceGroup
do pRuleElementRef :=
choice
[
pAlternates
pSequenceGroup
pString
]
do pNotAlternativesRef :=
choice
[
pSequenceGroup
pString
]
"\"0\" / (\"1\" \"2\") / \"3\" / (\"4\" / \"5\") / \"6\" / \"7\""
|> run (pRuleElement .>> (skipNewline <|> eof))
In addition to the addition of pNotAlternatives
I also modified it so that it would backtrack when failing to parse the alternative separator /
which allows it to proceed after "realizing" that it wasn't a list of alternatives after all.
Upvotes: 0
Reputation: 243096
The issue is that when you run the pRuleElement
parser on the input, it correctly parses one string, leaving some unconsumed input, but then it fails later outside of the choice
that would backtrack.
You can run the pAlternates
parser on the main input, which actually works:
"\"0\" / \"1\" / \"2\" / \"3\" / \"4\" / \"5\" / \"6\" / \"7\""
|> run (pAlternates .>> (skipNewline <|> eof))
I suspect that you can probably just do this - the pAlternates
parser works correctly, even on just a single string - it will just return Alternates
containing a singleton list.
Upvotes: 2