matt3141
matt3141

Reputation: 4423

Haskell Recursive Type

I am attempting to create a function in Haskell returning the Resp type illustrated below in a strange mix between BNF and Haskell types.

elem ::= String | (String, String, Resp)
Resp ::= [elem]

My question is (a) how to define this type in Haskell, and (b) if there is a way of doing so without being forced to use custom constructors, e.g., Node, rather using only tuples and arrays.

Upvotes: 5

Views: 6772

Answers (2)

Chris Taylor
Chris Taylor

Reputation: 47392

You said that "the variety of keywords (data, type, newtype) has been confusing for me". Here's a quick primer on the data construction keywords in Haskell.

Data

The canonical way to create a new type is with the data keyword. A general type in Haskell is a union of product types, each of which is tagged with a constructor. For example, an Employee might be a line worker (with a name and a salary) or a manager (with a name, salary and a list of reports).

We use the String type to represent an employee's name, and the Int type to represent a salaray. A list of reports is just a list of Employees.

data Employee = Worker  String Int
              | Manager String Int [Employee]

Type

The type keyword is used to create type synonyms, i.e. alternate names for the same type. This is typically used to make the source more immediately understandable. For example, we could declare a type Name for employee names (which is really just a String) and Salary for salaries (which are just Ints), and Reports for a list of reports.

type Name    = String
type Salary  = Int
type Reports = [Employee]

data Employee = Worker  Name Salary
              | Manager Name Salary Reports

Newtype

The newtype keyword is similar to the type keyword, but it adds an extra dash of type safety. One problem with the previous block of code is that, although a worker is a combination of a Name and a Salary, there is nothing to stop you using any old String in the Name field (for example, an address). The compiler doesn't distinguish between Names and plain old Strings, which introduces a class of potential bugs.

With the newtype keyword we can make the compiler enforce that the only Strings that can be used in a Name field are the ones explicitly tagged as Names

newtype Name    = Name String
newtype Salary  = Salary Int
newtype Reports = Reports [Employee]

data Employee = Worker  Name Salary
              | Manager Name Salary Reports

Now if we tried to enter a String in the Name field without explicitly tagging it, we get a type error

>>> let kate = Worker (Name "Kate") (Salary 50000)     -- this is ok
>>> let fred = Worker "18 Tennyson Av." (Salary 40000) -- this will fail

<interactive>:10:19:
    Couldn't match expected type `Name' with actual type `[Char]'
    In the first argument of `Worker', namely `"18 Tennyson Av."'
    In the expression: Worker "18 Tennyson Av." (Salary 40000)
    In an equation for `fred':
        fred = Worker "18 Tennyson Av." (Salary 40000)

What's great about this is that because the compiler knows that a Name is really just a String, it optimizes away the extra constructor, so this is just as efficient as using a type declaration -- the extra type safety comes "for free". This requires an important restriction -- a newtype has exactly one constructor with exactly one value. Otherwise the compiler wouldn't know which constructor or value was the correct synonym!

One disadvantage of using a newtype declaration is that now a Salary is no longer just an Int, you can't directly add them together. For example

>>> let kate'sSalary = Salary 50000
>>> let fred'sSalary = Salary 40000
>>> kate'sSalary + fred'sSalary

<interactive>:14:14:
    No instance for (Num Salary)
      arising from a use of `+'
    Possible fix: add an instance declaration for (Num Salary)
    In the expression: kate'sSalary + fred'sSalary
    In an equation for `it': it = kate'sSalary + fred'sSalary

The somewhat complicated error message is telling you that a Salary isn't a numeric type, so you can't add them together (or at least, you haven't told the compiler how to add them together). One option would be to define a function that gets the underlying Int from the Salary

getSalary :: Salary -> Int
getSalary (Salary sal) = sal

but in fact Haskell will write these for you if you use record syntax when declaring your newtypes

data Salary = Salary { getSalary :: Int }

Now you can write

>>> getSalary kate'sSalary + getSalary fred'sSalary
90000

Upvotes: 16

daniel gratzer
daniel gratzer

Reputation: 53881

Part 1:

data Elem = El String | Node String String Resp
type Resp = [Elem]

Part 2: Well... kinda. The unsatisfying answer is: You shouldn't want to because doing so is less type safe. The more direct answer is Elem needs it's own constructor but Resp is easily defined as a type synonym as above. However, I would recommend

newtype Resp = Resp { getElems :: [Elem] }

so that you can't mix up some random list of Elems with a Resp. This also gives you the function getElems so you don't have to do as much pattern matching on a single constructor. The newtype basically let's Haskell know that it should get rid of the overhead of the constructor during runtime so there's no extra indirection which is nice.

Upvotes: 6

Related Questions