Walle
Walle

Reputation: 317

Tokenizer identifier in Haskell

I'm writing this small program basically to identify each input tokens as operator/parenthesis/int.

However, I encountered a problem stating that

Not in scope: data constructor `Integer'

Here's what I have so far (Data.Char only defines isDigit, nothing else)

import Data.Char (isDigit)
data Token = TPlus | TTimes | TParenLeft | TParenRight | TNumber Integer | TError
    deriving (Show, Eq)


tokenize :: String -> [Token]
tokenize [] = []
tokenize (c:cs)
    | c == '+' = TPlus : tokenize cs
    | c == '*' = TTimes : tokenize cs
    | c == '(' = TParenLeft : tokenize cs
    | c == ')' = TParenRight : tokenize cs
    | isDigit c = TNumber Integer (read c) : tokenize cs
    | otherwise = TError : tokenize cs

Some example expected output:

*Main> tokenize "( 1 + 2 )"

should give

[TParenLeft,TNumber 1,TPlus,TNumber 2,TParenRight]

and

*Main> tokenize "abc"

should expect TError, but I'm getting

[TError,TError,TError]

I'd appreciate if anyone could shed some light on these two issues.

Upvotes: 0

Views: 291

Answers (2)

Mephy
Mephy

Reputation: 2986

For the Not in scope: data constructor 'Integer' part, the problem is that you have an extra Integer in the line

isDigit c = TNumber Integer (read c) : tokenize cs

which should be

isDigit c = TNumber (read [c]) : tokenize cs

The [c] part is needed because read has type read :: Read a => String -> a, and c is a Char, but [c] is a String containing only the char c.


tokenize "abc" is returning [TError, TError, TError] because of your error treatment policy:

| otherwise = TError : tokenize cs

This leads us to:

tokenize "abc"
-- c = 'a', cs = "bc"
TError : tokenize "bc"
TError : (TError : tokenize "c")
TError : TError : TError : []
[TError, TError, TError]

if you want to group all of your errors in a single TError, then you should drop all the incorrect input

| otherwise = TError : (dropWhile (\o -> o == TError) (tokenize cs))

Upvotes: 2

jwodder
jwodder

Reputation: 57600

When constructing a TNumber, you don't need to (and shouldn't) include the types of each of the constructor's arguments. Thus, you need to change this:

| isDigit c = TNumber Integer (read c) : tokenize cs

to this:

| isDigit c = TNumber (read c) : tokenize cs

Upvotes: 1

Related Questions