Haskell: Matching String Prefixes against List

Question

I've been learning some Haskell lately, and I thought a lexer might be a fun project. I'm using this ANSI C Yacc grammar as a guide.

The general program structure is:

lex :: [Char] -> Maybe [Token]
lex s =
  case tokenize([], s) of
    Just (tokens, []) -> Just tokens
    _ -> Nothing

tokenize :: ([Token], [Char]) -> Maybe ([Token], [Char])

Where tokenize builds a list of tokens. I'm having trouble thinking of a suitable structure for tokenize. For example, to match keywords like int, I could write:

tokenize (toks, 'i':'n':'t':' ':rest) = tokenize (toks++[TokenKeyword IntK], rest)

But this seems like a terrible way to do things. Is there a way to pattern match against elements in a list? Could I create a list of all keywords, and attempt to match them as prefixes of the input string?

Izaak Weiss · Accepted Answer

If you want to match based on a string prefix, you could use the ViewPatterns extension. This extension can be enabled by passing -XViewPatterns to the compiler, by running :set -XViewPatterns in ghci, or by putting {-# LANGUAGE ViewPatterns #-} at the top of the file.

Then, you can write a function matchPrefix (not 100% optimal, as it does iterate over prefix twice):

matchPrefix :: String -> String -> Maybe String
matchPrefix prefix result
  | and (zipWith (==) prefix result) = Just (drop (length prefix) result)
  | otherwise = Nothing

And then use it in a pattern like the following:

startsWithInt :: String -> Bool
startsWithInt (matchPrefix "int " -> Just rest) = True
startsWithInt _ = False

If you wanted to match based on a list of tokens, and get out the rest of the string and which token matched, you could do that by modifying matchPrefix to do that instead.

Haskell: Matching String Prefixes against List

Answers (1)

Related Questions