Reputation: 62818
So I'm having a go at writing a complex parser, using only Applicative (the parser in question doesn't even implement Monad at all).
For trivial parsers, this is quite easy. For non-trivial ones... not so much. The applicative interface seems to violently force you to write everything in point-free style. This is extremely difficult to deal with.
Consider, for example:
call = do
n <- name
char '('
trim
as <- sepBy argument (char ',' >> trim)
char ')'
trim
char '='
r <- result
return $ Call {name = n, args = as, result = r}
Now let's try to write that using applicative:
call =
(\ n _ _ as _ _ _ _ r -> Call {name = n, args = as, result = r}) <$>
name <*>
char '(' <*>
trim <*>
sepBy argument (const const () <$> char ',' <*> trim) <*>
char ')' <*>
trim <*>
char '=' <*>
trim <*>
result
Applicative has forced me to put the variable bindings very far away from where the actual parser is. (E.g., try to confirm that as
is actually bound to sepBy argument ...
; it's not at all easy to verify that I haven't got the wrong count of _
patterns!)
Another very unintuitive thing is that <*>
applies a function to a value, but *>
and <*
are just pure sequencing. This took for ages to wrap my mind around. Different method names would have made this far, far clearer. (But Monad seems to have grabbed >>
and <<
, sadly.) It seems that these can be stacked, yielding things like
exit =
"EXIT:" *>
trim *>
name <*
char '(' <*
trim <*
char ')' <*
trim
It's rather non-obvious that you can do this. And, to me, this code really isn't terribly readable. More importantly, I still haven't figured out how you deal with collecting multiple values while dropping multiple other values.
In all, I find myself wishing I could just use do-notation! I don't actually need to change effects based on prior results; I don't need the power of Monad. But the notation is so much more readable. (I keep wondering whether it would actually be feasible to implement this; can you syntactically tell when a particular do-block can be mechanically transformed to applicative?)
Does anybody know of a way around these problems? Most particularly, how can I move the variable bindings closer to the parser they bind to?
Upvotes: 6
Views: 340
Reputation: 8956
I don't really have a solution for the problem, but perhaps some intuition might help you to build applicative parsers more easily. When it comes to applicative, there are two kinds of "sequencing" that needs to be taken into consideration:
When the two sequences match each other well, the result is a very nice and compact representation of the parser in applicative notation. For example:
data Infix = Infix Double Operator Double
infix = Infix <$> number <*> operator <*> number
The problem is that when the sequence don't match exactly, you have to massage the underlying values in order for things to work (you can't change the ordering of the parsers):
number = f <$> sign <*> decimal <*> exponent
where f sign decimal exponent = sign * decimal * 10 ^^ exponent
Here, in order to compute the number you have to do a slightly nontrivial combination of operations, which is done by the local function f
.
Another typical situation is that you need to discard some value:
exponent = oneOf "eE" *> integer
Here, *>
discards the value on the left, keep the value on the right. The <*
operator does opposite, discarding the right and keeping the left. When you have a chain of such operations, you have to decode them using the left-associativity:
p1 *> p2 <* p3 *> p4 <* p5 ≡ (((p1 *> p2) <* p3) *> p4) <* p5
This is artificially contrived: you generally don't want to do this. It's better to break up the expression into meaningful pieces (and preferably give meaningful names). One common pattern that you will see is:
-- discard the result of everything except `p3`
p1 *> p2 *> p3 <* p4 <* p5
There is a small caveat though, if you want to apply something else to p3
or if p3
consists of multiple parts, you will have to use parentheses:
-- applying a pure function
f <$> (p1 *> p2 *> p3 <* p4 <* p5) ≡ p1 *> p2 *> (f <$> p3) <* p4 <* p5
-- p3 consists of multiple parts
p1 *> p2 *> (p3' <*> p3'') <* p4 <* p5)
Again, in these situations it's often better to just break up the expression into meaningful fragments with names.
Applicative notation, in some sense, forces you into dividing up the parsers into logical chunks so that it's easier to read, as opposed to the monadic notation where you could conceivably do everything in one monolithic block.
Upvotes: 4
Reputation: 19637
Well, your example parser is artificially complicated.
There are lots of occurrences of trim
that you can abstract from:
token p = p <* trim
You can also abstract from something that occurs between a pair of matching parentheses:
parens p = token (char '(') *> p <* token (char ')')
Now what's left is:
call =
(\ n as _ r -> Call {name = n, args = as, result = r}) <$>
name <*>
parens (sepBy argument (() <$ token (char ','))) <*>
token (char '=') <*>
result
Finally, you shouldn't count occurrences of _
, but rather learn to use <$
and <*
. Here are useful rules of thumb:
Only use *>
in the combination foo *> p <* bar
such as in parens
above, nowhere else.
Make your parsers have the form f <$> p1 <*> ... <*> pn
, and now choose between <$>
and <$
in the first position or between <*>
and <*
in all other positions purely on the question of whether you're interested in the result of the subsequent parser. If you are, use the variant with the >
, otherwise, use the one without. Then you never need to ignore any arguments in f
, because you're not even getting access to them. In the reduced example case above, only the =
token is left that we're not interested in, so we can say
call = Call <$> name
<*> parens (sepBy argument (() <$ token (char ',')))
<* token (char '=')
<*> result
(This is assuming that Call
actually takes only these three arguments.) I'd argue that this version is easier to read even than your original do
-based version.
To answer your more general question: Yes, it's possible to recognize do
-statements that don't need the power of monads. Simply put, they're the ones which are just a sequence of bindings with a return
in the very end, and all of the bound variables are only used in the final return
and nowhere else. There is a proposal to add this to GHC. (Personally, I'm not a huge fan of it, however. I think applicative notation is more functional than do-notation.)
Upvotes: 9
Reputation: 105886
Write smaller parsers. For example, your arguments seem to be (argument[, argument…])
. That can be easily expressed by
argListP :: Parser [Argument]
argListP = char '(' *> trim *> argument `sepBy` (char ',' *> trim) <* char ')'
which is still quite readable: a '(' followed by whitespace, arguments separated by comma and whitespace, and a trailing ')'. Same can be done for your result
:
resultP :: Parser Result
resultP = trim *> char '=' *> result
As you can see, that is still readable: arbitrary whitespace, followed by equal sign and some kind of result. Now call
is almost trivial:
call :: Parser Call
call = Call <$> name <*> argListP <*> resultP
Upvotes: 4