Reading Roman numerals in Haskell

Question

I've been playing with Haskell data types for the past few days, using a custom type to work with Roman numerals:

data RomanNumeral = I | IV | V | IX | X | XL | L | XC | C | CD | D | CM | M deriving (Eq, Ord, Show, Read)

stringToRomanNumeral :: String -> Maybe [RomanNumeral]
stringToRomanNumeral romString
    | unRoman = Nothing
    | otherwise = Just $ map read $ map (\x -> [x]) romStringUpper
    where 
        romStringUpper = map C.toUpper romString
        unRoman = any (`notElem` "MDCLXVI") romStringUpper

This works fine, but catches only 1-char numerals (so I have to calculate the value of IV, IX etc. later on).

Is there a way to read (or reads) the input string such that the returned value of Maybe [RomanNumeral] contains 2-char numerals, too? I tried dabbling with pattern matching, but I cannot seem to get the type right.

Daniel Fischer · Accepted Answer

Using reads doesn't work well because it expects tokens, it won't split up e.g. "XIV" into "X" and "IV" to obtain two parseable parts, it regards the entire char sequence as one token since they belong to the same character class. You can write your own parser for roman numerals (and you should try, writing parsers is fun) taking care of special sequences.

A simplistic approach is

module Roman where

import Data.Char as C

data RomanNumeral = I | IV | V | IX | X | XL | L | XC | C | CD | D | CM | M
    deriving (Eq, Ord, Show, Read)

stringToRomanNumeral :: String -> Maybe [RomanNumeral]
stringToRomanNumeral = fmap collate . sequence . map (toRom . C.toUpper)
  where
    romDict = zip "IVXLCDM" [I,V,X,L,C,D,M]
    toRom = flip lookup romDict

collate :: [RomanNumeral] -> [RomanNumeral]
collate (x:ys@(y:zs)) = case lookup (x,y) collationDict of
                          Just v -> v : collate zs
                          Nothing -> x : collate ys
collate xs = xs

collationDict :: [((RomanNumeral,RomanNumeral),RomanNumeral)]
collationDict =
    [ ((I,V),IV)
    , ((I,X),IX)
    , ((X,L),XL)
    , ((X,C),XC)
    , ((C,D),CD)
    , ((C,M),CM)
    ]

it's not very flexible either, any bad character will lead to a Nothing result, but that's easily modifiable (one could use catMaybes instead of sequence to simply ignore invalid characters for example). And it does not check the general (modern) 'decreasing value' rule (which makes interpreting 'IX' as 9 instead of 11 possible). Checking validity can however be done after parsing.

Reading Roman numerals in Haskell

Answers (2)

Related Questions