lightandlight
lightandlight

Reputation: 1372

Generating a parser given a list of tokens

Background

I'm trying to implement a date printing and parsing system using Parsec.

I have successfully implemented a printing function of type

showDate :: String -> Date -> Parser String

It takes parses a formatting string and creates a new string based on the tokens that the formatted string presented.

For example

showDate "%d-%m-%Y" $ Date 2015 3 17

has the output Right "17-3-2015"

I already wrote a tokenizer to use in the showDate function, so I thought that I could just use the output of that to somehow generate a parser using the function readDate :: [Token] -> Parser Date. My idea quickly came to a halt as I realised I had no idea how to implement this.

What I want to accomplish

Assume we have the following functions and types (the implementation doesn't matter):

data Token = DayNumber | Year | MonthNumber | DayOrdinal | Literal String

-- Parses four digits and returns an integer
pYear :: Parser Integer

-- Parses two digits and returns an integer
pMonthNum :: Parser Int

-- Parses two digits and returns an integer
pDayNum :: Parser Int

-- Parses two digits and an ordinal suffix and returns an integer
pDayOrd :: Parser Int

-- Parses a string literal
pLiteral :: String -> Parser String

The parser readDate [DayNumber,Literal "-",MonthNumber,Literal "-",Year] should be equivalent to

do
     d <- pDayNum
     pLiteral "-"
     m <- pMonthNum
     pLiteral "-"
     y <- pYear
     return $ Date y m d

Similarly, the parser readDate [Literal "~~", MonthNumber,Literal "hello",DayNumber,Literal " ",Year] should be equivalent to

do
     pLiteral "~~"
     m <- pMonthNum
     pLiteral "hello"
     d <- pDayNum
     pLiteral " "
     y <- pYear
     return $ Date y m d

My intuition suggests there's some kind of concat/map/fold using monad bindings that I can use for this, but I have no idea.

Questions

Is parsec the right tool for this?

Is my approach convoluted or ineffective?

Upvotes: 3

Views: 284

Answers (1)

Cirdec
Cirdec

Reputation: 24156

Your Tokens are instructions in a small little language for date formats [Token].

import Data.Functor
import Text.Parsec
import Text.Parsec.String

data Date = Date Int Int Int deriving (Show)

data Token = DayNumber | Year | MonthNumber | Literal String

In order to interpret this language, we need a type that represents the state of the interpreter. We start off not knowing any of the components of the Date and then discover them as we encounter DayNumber, Year, or MonthNumber. The following DateState represents the state of knowing or not knowing each of the components of the Date.

data DateState = DateState {dayState :: (Maybe Int), monthState :: (Maybe Int), yearState :: (Maybe Int)}

We will start interpreting a [Token] with DateState Nothing Nothing Nothing.

Each Token will be converted into a function that reads the DateState and produces a parser that computes the new DateState.

readDateToken :: Token -> DateState -> Parser DateState
readDateToken (DayNumber) ds =
    do
        day <- pNatural
        return ds {dayState = Just day}
readDateToken (MonthNumber) ds =
    do
        month <- pNatural
        return ds {monthState = Just month}
readDateToken (Year) ds =
    do
        year <- pNatural
        return ds {yearState = Just year}
readDateToken (Literal l) ds = string l >> return ds

pNatural :: Num a => Parser a
pNatural = fromInteger . read <$> many1 digit

To read a date interpreting a [Token] we will first convert it into a list of functions that decide how to parse a new state based on the current state with map readDateToken :: [Token] -> [DateState -> Parser DateState]. Then, starting with a parser that succeeds with the initial state return (DateState Nothing Nothing Nothing) we will bind all of these functions together with >>=. If the resulting DateState doesn't completely define the Date we will complain that the [Token]s was invalid. We also could have checked this ahead of time. If you want to include invalid date errors as parsing errors this would also be the place to check that the Date is valid and doesn't represent a non-existent date like April 31st.

readDate :: [Token] -> Parser Date
readDate tokens =
    do
        dateState <- foldl (>>=) (return (DateState Nothing Nothing Nothing)) . map readDateToken $ tokens
        case dateState of 
            DateState (Just day) (Just month) (Just year) -> return (Date day month year)
            _                                             -> fail "Date format is incomplete"

We will run a few examples.

runp p s = runParser p () "runp" s

main = do
    print . runp (readDate [DayNumber,Literal "-",MonthNumber,Literal "-",Year])                   $ "12-3-456"
    print . runp (readDate [Literal "~~", MonthNumber,Literal "hello",DayNumber,Literal " ",Year]) $ "~~3hello12 456"
    print . runp (readDate [DayNumber,Literal "-",MonthNumber,Literal "-",Year,Literal "-",Year])  $ "12-3-456-789"
    print . runp (readDate [DayNumber,Literal "-",MonthNumber])                                    $ "12-3"

This results in the following outputs. Notice that when we asked to read the Year twice, the second of the two years was used in the Date. You can choose a different behavior by modifying the definitions for readDateToken and possibly modifying the DateState type. When the [Token] didn't specify how to read one of the date fields we get the error Date format is incomplete with a slightly incorrect description; this could be improved upon.

Right (Date 12 3 456)

Right (Date 12 3 456)

Right (Date 12 3 789)

Left "runp" (line 1, column 5):
unexpected end of input
expecting digit
Date format is incomplete

Upvotes: 4

Related Questions