How do I generalize a Parser which extracts values from tokens?

Question

I'm creating a parser in Haskell for a simple language. The parser takes a list of tokens, generated by a separate tokenizer, and turns it into a structured tree. While I was writing this Parser, I came across a reoccurring pattern that I would like to generalize. Here are the relevant definitions:

--Parser Definition
data Parser a b = Parser { runParser :: [a] -> Maybe ([a], b) }

--Token Definition
data Token = TokOp    { getTokOp :: Operator }
           | TokInt   { getTokInt :: Int }
           | TokIdent { getTokIdent :: String }
           | TokAssign
           | TokLParen
           | TokRParen
   deriving (Eq, Show)

The parser also has instances defined for MonadPlus and all of its super classes. Here are two examples of the reoccurring Pattern I am trying to generalize:

-- Identifier Parser
identP :: Parser Token String
identP = Parser $ \input -> case input of
  TokIdent s : rest -> Just (rest, s)
  _                 -> Nothing

--Integer Parser
intP :: Parser Token Int
intP = Parser $ \input -> case input of
  TokInt n : rest -> Just (rest, n)
  _               -> Nothing

As you can see, these two examples are incredibly similar, yet I see no way generalize it. Ideally I would want some function of type extractToken :: ?? -> Parser Token a where a is the value contained by the token. I am guessing the solution involves >>=, but I am not experienced enough with Monads to figure it out. This also may be impossible, I am not sure.

K. A. Buhr · Accepted Answer

It seems difficult to avoid at least some boilerplate. One simple approach is to manually define the field selectors to return Maybes:

{-# LANGUAGE LambdaCase #-}

data Token = TokOp Operator
           | TokInt Int
           | TokIdent String
           | TokAssign
           | TokLParen
           | TokRParen
   deriving (Eq, Show)

getTokOp    = \case { TokOp x    -> Just x ; _ -> Nothing }
getTokInt   = \case { TokInt x   -> Just x ; _ -> Nothing }
getTokIdent = \case { TokIdent x -> Just x ; _ -> Nothing }

and then the rest is:

fieldP :: (Token -> Maybe a) -> Parser Token a
fieldP sel = Parser $ \case tok:rest -> (,) rest <$> sel tok
                            []       -> Nothing

opP    = fieldP getTokOp
identP = fieldP getTokIdent
intP   = fieldP getTokInt

You could derive the getXXX selectors using Template Haskell or generics, though it's likely not worth it.

How do I generalize a Parser which extracts values from tokens?

Answers (1)

Related Questions