Reputation: 47402

When to use a type class, when to use a type

I was revisiting a piece of code I wrote to do combinatorial search a few months ago, and noticed that there was an alternative, simpler way to do something that I'd previously achieved with a type class.

Specifically, I previously had a type class for the type of search problems, which have an states of type s, actions (operations on states) of type a, an initial state, a way of getting a list of (action,state) pairs and a way of testing whether a state is a solution or not:

class Problem p s a where
    initial   :: p s a -> s
    successor :: p s a -> s -> [(a,s)]
    goaltest  :: p s a -> s -> Bool

This is somewhat unsatisfactory, as it requires the MultiParameterTypeClass extension, and generally needs FlexibleInstances and possibly TypeSynonymInstances when you want to make instances of this class. It also clutters up your function signatures, e.g.

pathToSolution :: Problem p => p s a -> [(a,s)]

I noticed today that I can get rid of the class entirely, and use a type instead, along the following lines

data Problem s a {
    initial   :: s,
    successor :: s -> [(a,s)],
    goaltest  :: s -> Bool
}

This doesn't require any extensions, the function signatures look nicer:

pathToSolution :: Problem s a -> [(a,s)]

and, most importantly, I found that after refactoring my code to use this abstraction instead of a type class, I was left with 15-20% fewer lines than I had previously.

The biggest win was in code that created abstractions using the type class - previously I had to create new data structures that wrapped the old ones in a complicated way, and then make them into instances of the Problem class (which required more language extensions) - lots of lines of code to do something relatively simple. After the refactor, I just had a couple of functions that did exactly what I wanted to.

I'm now looking through the rest of the code, trying to spot instances where I can replace type classes with types, and make more wins.

My question is: in what situation does will this refactoring not work? In what cases is it actually just better to use a type class rather than a data type, and how can you recognise those situations ahead of time, so you don't have to go through a costly refactoring?

Upvotes: 37

Answers (3)

Luis Casillas

Reputation: 30237

Your refactoring is closely related to this blog entry by Luke Palmer: "Haskell Antipattern: Existential Typeclass".

I think we can prove that your refactoring will always work. Why? Intuitively, because if some type Foo contains enough information so that we can make it into an instance of your Problem class, we can always write a Foo -> Problem function that "projects" Foo's relevant information into a Problem that contains exactly the information needed.

A bit more formally, we can sketch a proof that your refactoring always works. First, to set the stage, the following code defines a translation of a Problem class instance into a concrete CanonicalProblem type:

{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}

class Problem p s a where
    initial   :: p s a -> s
    successor :: p s a -> s -> [(a,s)]
    goaltest  :: p s a -> s -> Bool

data CanonicalProblem s a = CanonicalProblem {
    initial'   :: s,
    successor' :: s -> [(a,s)],
    goaltest'  :: s -> Bool
}

instance Problem CanonicalProblem s a where
    initial = initial'
    successor = successor'
    goaltest = goaltest'

canonicalize :: Problem p s a => p s a -> CanonicalProblem s a
canonicalize p = CanonicalProblem {
    initial' = initial p,
    successor' = successor p,
    goaltest' = goaltest p
}

Now we want to prove the following:

For any type Foo such that instance Problem Foo s a, it is possible to write a canonicalizeFoo :: Foo s a -> CanonicalProblem s a function that produces the same result as canonicalize when applied to any Foo s a.
It's possible to rewrite any function that uses the Problem class into an equivalent function that uses CanonicalProblem instead. E.g., if you have solve :: Problem p s a => p s a -> r, you can write a canonicalSolve :: CanonicalProblem s a -> r that's equivalent to solve . canonicalize

I'll just sketch proofs. In the case of (1), suppose you have a type Foo with this Problem instance:

instance Problem Foo s a where
    initial = initialFoo
    successor = successorFoo
    goaltest = goaltestFoo

Then given x :: Foo s a you can trivially prove the following by substitution:

-- definition of canonicalize
canonicalize :: Problem p s a => p s a -> CanonicalProblem s a
canonicalize x = CanonicalProblem {
                     initial' = initial x,
                     successor' = successor x,
                     goaltest' = goaltest x
                 }

-- specialize to the Problem instance for Foo s a
canonicalize :: Foo s a -> CanonicalProblem s a
canonicalize x = CanonicalProblem {
                     initial' = initialFoo x,
                     successor' = successorFoo x,
                     goaltest' = goaltestFoo x
                 }

And the latter can be used directly to define our desired canonicalizeFoo function.

In the case of (2), for any function solve :: Problem p s a => p s a -> r (or similar types that involve Problem constraints), and for any type Foo such that instance Problem Foo s a:

Define canonicalSolve :: CanonicalProblem s a -> r' by taking the definition of solve and substituting all occurrences of Problem methods with their CanonicalProblem instance definitions.
Prove that for any x :: Foo s a, solve x is equivalent to canonicalSolve (canonicalize x).

Concrete proofs of (2) require concrete definitions of solve or related functions. A general proof could go either of these two ways:

Induction over all types that have Problem p s a constraints.
Prove that all Problem functions can be written in terms of a small subset of the functions, prove that this subset has CanonicalProblem equivalents, and that the various ways of using them together preserves the equivalence.

Upvotes: 5

C. A. McCann

Reputation: 77424

Consider a situation where both the type and class exist in the same program. The type can be an instance of the class, but that's rather trivial. More interesting is that you can write a function fromProblemClass :: (CProblem p s a) => p s a -> TProblem s a.

The refactoring you performed is roughly equivalent to manually inlining fromProblemClass everywhere you construct something used as a CProblem instance, and making every function that accepts a CProblem instance instead accept TProblem.

Since the only interesting parts of this refactoring are the definition of TProblem and the implementation of fromProblemClass, if you can write a similar type and function for any other class, you can likewise refactor it to eliminate the class entirely.

When does this work?

Think about the implementation of fromProblemClass. You'll essentially be partially applying each function of the class to a value of the instance type, and in the process eliminating any reference to the p parameter (which is what the type replaces).

Any situation where refactoring away a type class is straightforward is going to follow a similar pattern.

When is this counterproductive?

Imagine a simplified version of Show, with only the show function defined. This permits the same refactoring, applying show and replacing each instance with... a String. Clearly we've lost something here--namely, the ability to work with the original types and convert them to a String at various points. The value of Show is that it's defined on a wide variety of unrelated types.

As a rule of thumb, if there are many different functions specific to the types which are instances of the class, and these are often used in the same code as the class functions, delaying the conversion is useful. If there's a sharp dividing line between code that treats the types individually and code that uses the class, conversion functions might be more appropriate with a type class being a minor syntactic convenience. If the types are used almost exclusively through the class functions, the type class is probably completely superfluous.

When is this impossible?

Incidentally, the refactoring here is similar to the difference between a class and interface in OO languages; similarly, the type classes where this refactoring is impossible are those which can't be expressed directly at all in many OO languages.

More to the point, some examples of things you can't translate easily, if at all, in this manner:

The class's type parameter appearing only in covariant position, such as the result type of a function or as a non-function value. Notable offenders here are mempty for Monoid and return for Monad.
The class's type parameter appearing more than once in a function's type may not make this truly impossible but it complicates matters quite severely. Notable offenders here include Eq, Ord, and basically every numeric class.
Non-trivial use of higher kinds, the specifics of which I'm not sure how to pin down, but (>>=) for Monad is a notable offender here. On the other hand, the p parameter in your class is not an issue.
Non-trivial use of multi-parameter type classes, which I'm also uncertain how to pin down and gets horrendously complicated in practice anyway, being comparable to multiple dispatch in OO languages. Again, your class doesn't have an issue here.

Note that, given the above, this refactoring is not even possible for many of the standard type classes, and would be counterproductive for the few exceptions. This is not a coincidence. :]

What do you give up by applying this refactoring?

You give up the ability to distinguish between the original types. This sounds obvious, but it's potentially significant--if there are any situations where you really need to control which of the original class instance types was used, applying this refactoring loses some degree of type safety, which you can only recover by jumping through the same sort of hoops used elsewhere to ensure invariants at run-time.

Conversely, if there are situations where you really need to make the various instance types interchangeable--the convoluted wrapping you mentioned being a classic symptom of this--you gain a great deal by throwing away the original types. This is most often the case where you don't actually care much about the original data itself, but rather about how it lets you operate on other data; thus using records of functions directly is more natural than an extra layer of indirection.

As noted above, this relates closely to OOP and the type of problems it's best suited to, as well as representing the "other side" of the Expression Problem from what's typical in ML-style languages.

Upvotes: 22

Satvik

Reputation: 11218

If you are from OOP backaground. You can think of typeclasses as interfaces in java. They are generally used when you want to provide same interface to different data types, generally involving data type specific implementations for each.

In your case there is no use of using a typeclass, it will only over complicate your code. for more informaction you can always refer to haskellwiki for better understanding. http://www.haskell.org/haskellwiki/OOP_vs_type_classes

The general rule of thumb is: If you are doubtful whether you need type classes or not, then you probably don't need them.

Upvotes: 1