Reputation: 62848
Apparently I'm too dumb to figure this out...
Consider the following string:
foobar(123, 456, 789)
I'm trying to work out how to parse this. In particular,
call = do
cs <- many1 letter
char '('
as <- many argument
return (cs, as)
argument = manyTill anyChar (char ',' <|> char ')')
This works perfectly, until I add stuff to the end of the input string, at which point it tries to parse that stuff as the next argument, and gets upset when it doesn't end with a comma or bracket.
Fundamentally, the trouble is that a comma is a separator, while a bracket is a terminator. Parsec doesn't appear to provide a combinator for that.
Just to make things more interesting, the input string can also be
foobar(123, 456, ...
which indicates that the message is incomplete. There appears to be no way of parsing a sequence with two possible terminators and knowing which one was found. (I actually want to know whether the argument list was complete or incomplete.)
Can anyone figure out how I climb out of this?
Upvotes: 1
Views: 327
Reputation: 19647
You should exclude your separator/terminator characters from the allowed characters for a function argument. Also, you can use between
and sepBy
to make the difference between separators and terminators clearer:
call = do
cs <- many1 letter
as <- between (char '(') (char ')')
$ sepBy (many1 (noneOf ",)")) (char ',')
return (cs, as)
However, this is probably still not what you want, because it doesn't handle whitespace properly. You should look at Text.Parsec.Token
for a more robust way to do this.
With the ...
-addition, it indeed becomes a bit weird, and I don't think it nicely fits into any
of the predefined combinators, so we'll have to just do it ourselves.
Let's define a type for our results:
data Args = String :. Args | Nil | Dots
deriving Show
infixr 5 :.
That's like a list, but it has two different kinds of "empty list" to distinguish the ...
case. Of course, you can also use ([String], Bool)
as a result type, but I'll leave that as an exercise. The following assumes we have
import Control.Applicative ((<$>), (<*>), (<$), (*>))
The parsers become:
call = do
cs <- many1 letter
char '('
as <- args
return (cs, as)
args = do
(:.) <$> arg <*> argcont
<|> Dots <$ string "..."
arg = many1 (noneOf ".,)")
argcont =
Nil <$ char ')'
<|> char ',' *> args
This handles everything fine except whitespace, for which my original recommendation to look at token parsers remains.
Let's test:
GHCi> parseTest call "foobar(foo,bar,baz)"
("foobar","foo" :. ("bar" :. ("baz" :. Nil)))
GHCi> parseTest call "foobar(1,2,..."
("foobar","1" :. ("2" :. Dots))
GHCi> parseTest ((,) <$> call <*> call) "foo(1)bar(2,...)"
(("foo","1" :. Nil),("bar","2" :. Dots))
Upvotes: 2