Reputation: 28861
I am trying to create a complier using Haskell as part of my university coursework.
I want to create a method that matches any string like this:
int a = 5
int foo = 3
So this is the method I created:
readInstruction :: String -> String
readInstruction ( 'i' : 'n' : 't' : ' ' : varName : ' ' : '=' : ' ' : val : []) =
"Declare Int " ++ [varName] ++ " = " ++ [val]
However this only works for variable names of 1 letter. How should I do this?
Also, as a side note, I also noticed the following does not compile:
readInstruction ( "int " ++ varName ++ " = " ++ val ) =
"Declare Int " ++ varName ++ " = " ++ val
Why?
Please note that I'm new to Haskell and only know the basics. I don't know any other library functions and would prefer not to use them (as I have been discouraged to use them for my coursework).
Upvotes: 0
Views: 182
Reputation: 12070
There are many ways to solve this problem.... This would be my order of preference.
Use a parser lib, like Parsec.
Use a regex.
Use Prelude functions like splitAt
.
But since you can't use any libs, you will have to go with messy solution 4.
You've already shown us how to match the int
part, so you just need the remaining stuff. Since this is homework, I won't give you the answer, but give you one possible outline.
What you could do is break the problem into parts, and write multiple matcher functions of the type
showPart::String->String
where the parts would be something like showVarName
, showEq
, etc. Each part would need to consume part of the text, then call the next part (so ultimately you would only need to call the first part, the rest woulc be consumed in order). The only large modification from what you have above would be the need for recursion in variable length parts, like showVarName
.
showVarName (c:rest) | isAlphanum c = c ++ showVarName
showVarName x = .... --call the next part here
(yes, I added a new function isAlphaNum
.... You will need something like it, although it could be written using pattern matching if needed)
This will solve the problem, but note that the solution will be very brittle.... It will be hard to make any changes, to the ordering of the parts, to the type (what if the RHS can be a variable, or a full expression), to the allowed formats (ie- what if varname can be of form [alpha][alphaNum]*), or the output (what if you want to output a fully parsed expression tree, then use that in multiple ways, including plugging that into a show
function).
In practice no one would ever really parse this way, and I am assuming that that might be one of the lessons that your prof. may be trying to illustrate for you.
Upvotes: 4
Reputation: 54058
When you're pattern matching, you can only pattern match on constructors. For lists, your two constructors are :
and []
, whereas ++
is a function on lists. The compiler can't work backwards from a function application, but it can from a constructor application (a very special kind of function that even lives in its own namespace in Haskell).
A much better alternative to this would be to tokenize your input, this will prevent errors from having insufficient patterns, and will be much easier to process in the long run. Particularly since you're wanting to write a compiler, you should use a tokenizer as this is pretty much the accepted way to write parsers. You could instead have
-- A very simple tokenizer, only splits on whitespace
-- so `int x=1` won't be tokenized correctly
tokenize :: String -> [String]
tokenize = words
readInstructions :: [String] -> (String, [String])
readInstructions ("int" : varName : "=" : val : rest) = ("Declare Int" ++ varName ++ " = " ++ val, rest)
readInstructions otherPatterns = undefined
The reason why I return (String, [String])
is so that you could iteratively apply readInstructions
and have it only consume the number of tokens it needs for each command. So you could do
main = do
program <- readFile "myProgram.prog"
let tokens = tokenize program
(firstInstr, tokens') = readInstructions tokens
(secondInstr, tokens'') = readInstructions tokens'
putStrLn firstInstr
putStrLn secondInstr
If you think this looks laborious, you'd be correct. This is because there are much better ways of handling this sort of thing in Haskell, and quite elegantly too. Once you've completed your assignment, I would encourage you to look at the Parsec library, and the State monad. The Parsec library specifically has a lot of work done for you in terms of writing a tokenizer and turning those tokens into something meaningful, and the State monad is what the library is really built on top of. Having a good understanding of the State monad will help you as a Haskell programmer in general, as it is used a lot for many different problems.
Upvotes: 3