Reputation: 146
I've been learning some Haskell lately, and I thought a lexer might be a fun project. I'm using this ANSI C Yacc grammar as a guide.
The general program structure is:
lex :: [Char] -> Maybe [Token]
lex s =
case tokenize([], s) of
Just (tokens, []) -> Just tokens
_ -> Nothing
tokenize :: ([Token], [Char]) -> Maybe ([Token], [Char])
Where tokenize
builds a list of tokens. I'm having trouble thinking of a suitable structure for tokenize
. For example, to match keywords like int
, I could write:
tokenize (toks, 'i':'n':'t':' ':rest) = tokenize (toks++[TokenKeyword IntK], rest)
But this seems like a terrible way to do things. Is there a way to pattern match against elements in a list? Could I create a list of all keywords, and attempt to match them as prefixes of the input string?
Upvotes: 0
Views: 402
Reputation: 1310
If you want to match based on a string prefix, you could use the ViewPatterns extension. This extension can be enabled by passing -XViewPatterns to the compiler, by running :set -XViewPatterns
in ghci, or by putting {-# LANGUAGE ViewPatterns #-}
at the top of the file.
Then, you can write a function matchPrefix
(not 100% optimal, as it does iterate over prefix
twice):
matchPrefix :: String -> String -> Maybe String
matchPrefix prefix result
| and (zipWith (==) prefix result) = Just (drop (length prefix) result)
| otherwise = Nothing
And then use it in a pattern like the following:
startsWithInt :: String -> Bool
startsWithInt (matchPrefix "int " -> Just rest) = True
startsWithInt _ = False
If you wanted to match based on a list of tokens, and get out the rest of the string and which token matched, you could do that by modifying matchPrefix
to do that instead.
Upvotes: 1