user1720555
user1720555

Reputation: 23

Dropping text up to a special character with Parsec

I'm new to Haskell and Parsec --- my apologies if this question is trivial.

I want to parse lines of text that are structured like this:

<Text to be dropped> <special character (say "#")> <field 1> <comma> <field 2>
<comma> <field 3> <special character 2 (say "%")> <Text to be dropped>

I want my parser to discard the "text to be dropped" at the beginning and at the end, and to keep the contents of the fields. My main problem is understanding how to write a parser that drops everything up to a certain special character.

The parsers from the library that seem helpful are anyChar, manyTill and oneOf, but I don't understand how to combine them. I would be grateful for any simple example.

Upvotes: 2

Views: 675

Answers (2)

macron
macron

Reputation: 1836

When writing Parsec code, it is useful to first write out the grammar that you want to parse in BNF form first, because parsers written in Parsec end up very much looking like the grammar.

Let's try that:

line ::= garbage '#' field ',' field ',' field '%' garbage

In the above production, we assume a production named garbage, whose actual definition will depend on what text you actually want dropped. Likewise, we assume a production named field. Now let's write this production out as parsec code:

line = do
  garbage
  char '#'
  field1 <- field
  char ','
  field2 <- field
  char ','
  field3 <- field
  char '%'
  garbage
  return (field1, field2, field3)

This code reads exactly like the BNF. The essential difference is that the results of some of the subproductions are named, so that we can return a value built from these results (in this case a tuple).

Now i don't know what your notion of garbage is, but for the sake of example let's assume that you mean any whitespace. Then you could define garbage as follows:

garbage = many space

(or, alternatively, it so happens that parsec already has a combinator for parsing zero or more spaces called spaces). If the garbage could be anything except the # delimiter character, then you could say

garbage = many (noneOf "#")

This line will munch all input up to and excluding the first '#'. Either way, whatever value garbage produces as a result, since you are not binding a name to the value it will be thrown away.

Upvotes: 4

Emily
Emily

Reputation: 2684

Alternatively, you can use applicative parsers:

import Control.Applicative
import Text.Parsec
import Text.Parsec.String

type Field = ()                 --your type here

field = string "()" *> pure ()  --your parser here

parser :: Parser (Field, Field, Field)
parser = manyTill anyChar (char '#') *>
         ((,,) <$> (field <* char ',')
               <*> (field <* char ',')
               <*> (field <* char '%'))

Upvotes: 1

Related Questions