janeden
janeden

Reputation: 461

Reading Roman numerals in Haskell

I've been playing with Haskell data types for the past few days, using a custom type to work with Roman numerals:

data RomanNumeral = I | IV | V | IX | X | XL | L | XC | C | CD | D | CM | M deriving (Eq, Ord, Show, Read)

stringToRomanNumeral :: String -> Maybe [RomanNumeral]
stringToRomanNumeral romString
    | unRoman = Nothing
    | otherwise = Just $ map read $ map (\x -> [x]) romStringUpper
    where 
        romStringUpper = map C.toUpper romString
        unRoman = any (`notElem` "MDCLXVI") romStringUpper

This works fine, but catches only 1-char numerals (so I have to calculate the value of IV, IX etc. later on).

Is there a way to read (or reads) the input string such that the returned value of Maybe [RomanNumeral] contains 2-char numerals, too? I tried dabbling with pattern matching, but I cannot seem to get the type right.

Upvotes: 1

Views: 1110

Answers (2)

Daniel Fischer
Daniel Fischer

Reputation: 183858

Using reads doesn't work well because it expects tokens, it won't split up e.g. "XIV" into "X" and "IV" to obtain two parseable parts, it regards the entire char sequence as one token since they belong to the same character class. You can write your own parser for roman numerals (and you should try, writing parsers is fun) taking care of special sequences.

A simplistic approach is

module Roman where

import Data.Char as C

data RomanNumeral = I | IV | V | IX | X | XL | L | XC | C | CD | D | CM | M
    deriving (Eq, Ord, Show, Read)

stringToRomanNumeral :: String -> Maybe [RomanNumeral]
stringToRomanNumeral = fmap collate . sequence . map (toRom . C.toUpper)
  where
    romDict = zip "IVXLCDM" [I,V,X,L,C,D,M]
    toRom = flip lookup romDict

collate :: [RomanNumeral] -> [RomanNumeral]
collate (x:ys@(y:zs)) = case lookup (x,y) collationDict of
                          Just v -> v : collate zs
                          Nothing -> x : collate ys
collate xs = xs

collationDict :: [((RomanNumeral,RomanNumeral),RomanNumeral)]
collationDict =
    [ ((I,V),IV)
    , ((I,X),IX)
    , ((X,L),XL)
    , ((X,C),XC)
    , ((C,D),CD)
    , ((C,M),CM)
    ]

it's not very flexible either, any bad character will lead to a Nothing result, but that's easily modifiable (one could use catMaybes instead of sequence to simply ignore invalid characters for example). And it does not check the general (modern) 'decreasing value' rule (which makes interpreting 'IX' as 9 instead of 11 possible). Checking validity can however be done after parsing.

Upvotes: 3

dpington
dpington

Reputation: 1884

I think the current way you are using the RomanNumeral datatype is inherently a bad idea.

You should also override show/read instead of relying on the default set.

Perhaps

-- underlying datatype as int
data RomanNumeral = RomanNumeral Int

instance Show RomanNumeral where
    show (RomanNumeral x) = -- your code for converting a roman numeral to a string

instance Read RomanNumeral where
    readsPred d r = -- your code for reading a string into an integer and then return a RomanNumeral

Upvotes: 0

Related Questions