Reputation: 47382
I was revisiting a piece of code I wrote to do combinatorial search a few months ago, and noticed that there was an alternative, simpler way to do something that I'd previously achieved with a type class.
Specifically, I previously had a type class for the type of search problems, which have an states of type s
, actions (operations on states) of type a
, an initial state, a way of getting a list of (action,state) pairs and a way of testing whether a state is a solution or not:
class Problem p s a where
initial :: p s a -> s
successor :: p s a -> s -> [(a,s)]
goaltest :: p s a -> s -> Bool
This is somewhat unsatisfactory, as it requires the MultiParameterTypeClass extension, and generally needs FlexibleInstances and possibly TypeSynonymInstances when you want to make instances of this class. It also clutters up your function signatures, e.g.
pathToSolution :: Problem p => p s a -> [(a,s)]
I noticed today that I can get rid of the class entirely, and use a type instead, along the following lines
data Problem s a {
initial :: s,
successor :: s -> [(a,s)],
goaltest :: s -> Bool
}
This doesn't require any extensions, the function signatures look nicer:
pathToSolution :: Problem s a -> [(a,s)]
and, most importantly, I found that after refactoring my code to use this abstraction instead of a type class, I was left with 15-20% fewer lines than I had previously.
The biggest win was in code that created abstractions using the type class - previously I had to create new data structures that wrapped the old ones in a complicated way, and then make them into instances of the Problem
class (which required more language extensions) - lots of lines of code to do something relatively simple. After the refactor, I just had a couple of functions that did exactly what I wanted to.
I'm now looking through the rest of the code, trying to spot instances where I can replace type classes with types, and make more wins.
My question is: in what situation does will this refactoring not work? In what cases is it actually just better to use a type class rather than a data type, and how can you recognise those situations ahead of time, so you don't have to go through a costly refactoring?
Upvotes: 37
Views: 2465
Reputation: 30227
Your refactoring is closely related to this blog entry by Luke Palmer: "Haskell Antipattern: Existential Typeclass".
I think we can prove that your refactoring will always work. Why? Intuitively, because if some type Foo
contains enough information so that we can make it into an instance of your Problem
class, we can always write a Foo -> Problem
function that "projects" Foo
's relevant information into a Problem
that contains exactly the information needed.
A bit more formally, we can sketch a proof that your refactoring always works. First, to set the stage, the following code defines a translation of a Problem
class instance into a concrete CanonicalProblem
type:
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}
class Problem p s a where
initial :: p s a -> s
successor :: p s a -> s -> [(a,s)]
goaltest :: p s a -> s -> Bool
data CanonicalProblem s a = CanonicalProblem {
initial' :: s,
successor' :: s -> [(a,s)],
goaltest' :: s -> Bool
}
instance Problem CanonicalProblem s a where
initial = initial'
successor = successor'
goaltest = goaltest'
canonicalize :: Problem p s a => p s a -> CanonicalProblem s a
canonicalize p = CanonicalProblem {
initial' = initial p,
successor' = successor p,
goaltest' = goaltest p
}
Now we want to prove the following:
Foo
such that instance Problem Foo s a
, it is possible to write a canonicalizeFoo :: Foo s a -> CanonicalProblem s a
function that produces the same result as canonicalize
when applied to any Foo s a
.Problem
class into an equivalent function that uses CanonicalProblem
instead. E.g., if you have solve :: Problem p s a => p s a -> r
, you can write a canonicalSolve :: CanonicalProblem s a -> r
that's equivalent to solve . canonicalize
I'll just sketch proofs. In the case of (1), suppose you have a type Foo
with this Problem
instance:
instance Problem Foo s a where
initial = initialFoo
successor = successorFoo
goaltest = goaltestFoo
Then given x :: Foo s a
you can trivially prove the following by substitution:
-- definition of canonicalize
canonicalize :: Problem p s a => p s a -> CanonicalProblem s a
canonicalize x = CanonicalProblem {
initial' = initial x,
successor' = successor x,
goaltest' = goaltest x
}
-- specialize to the Problem instance for Foo s a
canonicalize :: Foo s a -> CanonicalProblem s a
canonicalize x = CanonicalProblem {
initial' = initialFoo x,
successor' = successorFoo x,
goaltest' = goaltestFoo x
}
And the latter can be used directly to define our desired canonicalizeFoo
function.
In the case of (2), for any function solve :: Problem p s a => p s a -> r
(or similar types that involve Problem
constraints), and for any type Foo
such that instance Problem Foo s a
:
canonicalSolve :: CanonicalProblem s a -> r'
by taking the definition of solve
and substituting all occurrences of Problem
methods with their CanonicalProblem
instance definitions.x :: Foo s a
, solve x
is equivalent to canonicalSolve (canonicalize x)
.Concrete proofs of (2) require concrete definitions of solve
or related functions. A general proof could go either of these two ways:
Problem p s a
constraints.Problem
functions can be written in terms of a small subset of the functions, prove that this subset has CanonicalProblem
equivalents, and that the various ways of using them together preserves the equivalence.Upvotes: 5
Reputation: 77374
Consider a situation where both the type and class exist in the same program. The type can be an instance of the class, but that's rather trivial. More interesting is that you can write a function fromProblemClass :: (CProblem p s a) => p s a -> TProblem s a
.
The refactoring you performed is roughly equivalent to manually inlining fromProblemClass
everywhere you construct something used as a CProblem
instance, and making every function that accepts a CProblem
instance instead accept TProblem
.
Since the only interesting parts of this refactoring are the definition of TProblem
and the implementation of fromProblemClass
, if you can write a similar type and function for any other class, you can likewise refactor it to eliminate the class entirely.
Think about the implementation of fromProblemClass
. You'll essentially be partially applying each function of the class to a value of the instance type, and in the process eliminating any reference to the p
parameter (which is what the type replaces).
Any situation where refactoring away a type class is straightforward is going to follow a similar pattern.
Imagine a simplified version of Show
, with only the show
function defined. This permits the same refactoring, applying show
and replacing each instance with... a String
. Clearly we've lost something here--namely, the ability to work with the original types and convert them to a String
at various points. The value of Show
is that it's defined on a wide variety of unrelated types.
As a rule of thumb, if there are many different functions specific to the types which are instances of the class, and these are often used in the same code as the class functions, delaying the conversion is useful. If there's a sharp dividing line between code that treats the types individually and code that uses the class, conversion functions might be more appropriate with a type class being a minor syntactic convenience. If the types are used almost exclusively through the class functions, the type class is probably completely superfluous.
Incidentally, the refactoring here is similar to the difference between a class and interface in OO languages; similarly, the type classes where this refactoring is impossible are those which can't be expressed directly at all in many OO languages.
More to the point, some examples of things you can't translate easily, if at all, in this manner:
The class's type parameter appearing only in covariant position, such as the result type of a function or as a non-function value. Notable offenders here are mempty
for Monoid
and return
for Monad
.
The class's type parameter appearing more than once in a function's type may not make this truly impossible but it complicates matters quite severely. Notable offenders here include Eq
, Ord
, and basically every numeric class.
Non-trivial use of higher kinds, the specifics of which I'm not sure how to pin down, but (>>=)
for Monad
is a notable offender here. On the other hand, the p
parameter in your class is not an issue.
Non-trivial use of multi-parameter type classes, which I'm also uncertain how to pin down and gets horrendously complicated in practice anyway, being comparable to multiple dispatch in OO languages. Again, your class doesn't have an issue here.
Note that, given the above, this refactoring is not even possible for many of the standard type classes, and would be counterproductive for the few exceptions. This is not a coincidence. :]
You give up the ability to distinguish between the original types. This sounds obvious, but it's potentially significant--if there are any situations where you really need to control which of the original class instance types was used, applying this refactoring loses some degree of type safety, which you can only recover by jumping through the same sort of hoops used elsewhere to ensure invariants at run-time.
Conversely, if there are situations where you really need to make the various instance types interchangeable--the convoluted wrapping you mentioned being a classic symptom of this--you gain a great deal by throwing away the original types. This is most often the case where you don't actually care much about the original data itself, but rather about how it lets you operate on other data; thus using records of functions directly is more natural than an extra layer of indirection.
As noted above, this relates closely to OOP and the type of problems it's best suited to, as well as representing the "other side" of the Expression Problem from what's typical in ML-style languages.
Upvotes: 22
Reputation: 11208
If you are from OOP backaground. You can think of typeclasses as interfaces in java. They are generally used when you want to provide same interface to different data types, generally involving data type specific implementations for each.
In your case there is no use of using a typeclass, it will only over complicate your code. for more informaction you can always refer to haskellwiki for better understanding. http://www.haskell.org/haskellwiki/OOP_vs_type_classes
The general rule of thumb is: If you are doubtful whether you need type classes or not, then you probably don't need them.
Upvotes: 1