Reputation: 23
I'm new to Haskell and Parsec --- my apologies if this question is trivial.
I want to parse lines of text that are structured like this:
<Text to be dropped> <special character (say "#")> <field 1> <comma> <field 2>
<comma> <field 3> <special character 2 (say "%")> <Text to be dropped>
I want my parser to discard the "text to be dropped" at the beginning and at the end, and to keep the contents of the fields. My main problem is understanding how to write a parser that drops everything up to a certain special character.
The parsers from the library that seem helpful are anyChar, manyTill and oneOf, but I don't understand how to combine them. I would be grateful for any simple example.
Upvotes: 2
Views: 675
Reputation: 1836
When writing Parsec code, it is useful to first write out the grammar that you want to parse in BNF form first, because parsers written in Parsec end up very much looking like the grammar.
Let's try that:
line ::= garbage '#' field ',' field ',' field '%' garbage
In the above production, we assume a production named garbage
, whose actual definition will depend on what text you actually want dropped. Likewise, we assume a production named field
. Now let's write this production out as parsec code:
line = do
garbage
char '#'
field1 <- field
char ','
field2 <- field
char ','
field3 <- field
char '%'
garbage
return (field1, field2, field3)
This code reads exactly like the BNF. The essential difference is that the results of some of the subproductions are named, so that we can return a value built from these results (in this case a tuple).
Now i don't know what your notion of garbage is, but for the sake of example let's assume that you mean any whitespace. Then you could define garbage
as follows:
garbage = many space
(or, alternatively, it so happens that parsec already has a combinator for parsing zero or more spaces called spaces
). If the garbage could be anything except the #
delimiter character, then you could say
garbage = many (noneOf "#")
This line will munch all input up to and excluding the first '#'. Either way, whatever value garbage
produces as a result, since you are not binding a name to the value it will be thrown away.
Upvotes: 4
Reputation: 2684
Alternatively, you can use applicative parsers:
import Control.Applicative
import Text.Parsec
import Text.Parsec.String
type Field = () --your type here
field = string "()" *> pure () --your parser here
parser :: Parser (Field, Field, Field)
parser = manyTill anyChar (char '#') *>
((,,) <$> (field <* char ',')
<*> (field <* char ',')
<*> (field <* char '%'))
Upvotes: 1